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Abstract 

We prove new results concerning the relation between bifix codes, epis- 
turmian words and subgroups of free groups. We study bifix codes in 
factorial sets of words. We generalize most properties of ordinary max- 
imal bifix codes to bifix codes maximal in a recurrent set F of words 
(_F-maximal bifix codes). In the case of bifix codes contained in Sturmian 
sets of words, we obtain several new results. Let F be a Sturmian set 
of words, defined as the set of factors of a strict episturmian word. Our 
results express the fact that an _F-maximal bifix code of degree d behaves 
just as the set of words of F of length d. An F-maximal bifix code of 
degree d in a Sturmian set of words on an alphabet with k letters has 
(fc — l)d + 1 elements. This generalizes the fact that a Sturmian set con- 
tains (fe — l)d + 1 words of length d. Moreover, given an infinite word x, 
if there is a finite maximal bifix code X of degree d such that x has at 
most d factors of length d in X, then x is ultimately periodic. Our main 
result states that any _F-maximal bifix code of degree d on the alphabet 
A is the basis of a subgroup of index d of the free group on A. 
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52 1 Introduction 

53 This paper studies a new relation between three objects previously unrelated 

54 altogether: bifix codes, epiturmian words and subgroups of free groups. 

55 We first give some elements on the background of the first two. The study of 

56 bifix codes goes back to founding papers by Schiitzenberger and by Gilbert 

57 and Moore [|5|. These papers already contain significant results. The first 

58 systematic study is in the papers of Schiitzenberger |4|. The general idea 

59 is that the submonoids generated by bifix codes are an adequate generalization 

60 of the subgroups of a group. This is illustrated by the striking fact that, under 

61 a mild restriction, the average length of a maximal bifix code with respect to 

62 a Bernoulli distribution on the alphabet is an integer. Thus, in some sense a 

63 maximal bifix code behaves as the uniform code formed of all the words of a given 

64 length. The theory of bifix codes was developed in a considerable way by Cesari. 

65 He proved that all the finite maximal bifix codes may be obtained by internal 

66 transformations from uniform codes He also defined the notion of derived 

67 code which allows to build maximal bifix codes by increasing degrees pO[ . 

68 Sturmian words are infinite words over a binary alphabet that have exactly 

69 n + 1 factors of length n for each n > 0. Their origin can be traced back 

70 to the astronomer J. Bernoulli HI. Their first in-depth study is by Morse and 

71 Hedlund p4|]. Many combinatorial properties were described in the paper by 
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Coven and Hedlund [|l5[. Note that, although Sturmian words appear first in 
the work of Morse and Hedlund, their finitary version, Christoffel and standard 
words, appear much before in the work of Christoffel [n2| and, apparently in- 



dependently, in the work of Markoff l3q, 39 ; the latter constructed the famous 



Markoff numbers by using them. The Markoff theory (which was designed to 
study minima's of quadratic forms) was revisited often by mathematicians, no- 
tably by Frobenius [|4|, Dickson H. Cohn Cusick and Flahive and 
Bombieri Q]. There, the connection with the free group on two generators was 
established. Other connection of Christoffel words with the free group may be 
found in Osborne and Zieschang and Kassel and Reutenauer [|^. More- 
over, the Sturmian morphisms (substitutions that preserve Sturmian words) are 
the positive endomorphisms of the free group on two generators, see Wen and 
Wen [0 , Mignosi and Seebold Q . Thus Sturmian words are closely related to 
the free group. This connection is one of the main points of the present paper. 

Sturmian words were generalized to arbitrary alphabets. Following an initial 
work by Arnoux and Rauzy and developing ideas of De Luca , Droubay, 
Justin and Pirillo introduced in [ p2[ the notion of episturmian words which 
generalizes Sturmian words to arbitrary finite alphabets. 

In this paper, we consider the extension of the results known for bifix codes 
maximal in the free monoid to bifix codes maximal in more restricted sets of 
words, and in particular the sets of factors of episturmian words. 

We extend most properties of ordinary maximal bifix codes to bifix codes 
that are maximal in a recurrent set F of words (i<"-maximal bifix codes). We 
show in particular that the average length of a finite i<"-maximal bifix code of 
degree d in a recurrent set F with respect to an invariant probability distribution 



on F is equal to d (Corollary 4.3.S) 



98 Our main objective is the case of the set of factors of an episturmian word. 

99 We actually work with the set of factors of a strict episturmian word, called 

100 simply a Sturmian set. The number of factors of length d of a strict epistur- 

101 mian word over an alphabet of k letters is known to be (fc — l)d + 1. Our main 

102 result is that a maximal bifix code of degree d in a Sturmian set over an al- 
phabet of k letters is always a basis of a subgroup of index d of the free group 



104 (Theorem 6.2.1 ). In particular, it has (fc — l)d + 1 elements (Theorem 5.2.1 ). 
Since the set of all words of length c? is a maximal bifix code of degree d, this 
yields a strong generalization of the previous property. In particular, every fi- 
nite maximal bifix code of degree d over a two letter alphabet contains exactly 
d + 1 factors of any Sturmian word. 

Finally, bifix codes X contained in restricted sets of words are used to study 
the groups in the syntactic monoid of the submonoid X* (Theorem 7.2.3| ). This 



111 aspect was first considered by Schiitzenberger in . He has studied the condi- 

112 tions under which parameters linked with the syntactic monoid M of a finitely 

113 generated submonoid X* of a free monoid A* can be bounded in terms of 

114 Card(X) only. One of his results is that, apart from a special case where the 

115 group is cyclic, the cardinality of a group contained in M is such a parameter. 

116 In p5| , Schiitzenberger conjectured a refinement of his result which was subse- 

117 quently proved by Cesari. This study led to the Critical Factorization Theorem 
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118 that we will meet again here (Theorem 5.3.. 



The extension of the results concerning codes in free monoids to codes in a 
restricted set of words has already been considered by several authors. However, 
most of them have focused on general codes rather than on the particular class of 
bifix codes. In 49 the notion of codes of paths in a graph has been introduced. 
Such paths can also be viewed as words in a restricted set. The notion of a 
bifix code of paths has been studied in where the internal transformation is 
generalized. In Q , the notion of code in a factorial set of words was introduced. 
The definition of a code X in a factorial set F requires that the set X* of all 
concatenations of words in X is included in F. This approach was pushed 
further in A more general notion was considered in [Q. It only requires 

that X C F and that no word of F has two distinct factorizations but not 
necessarily that X* C F. The connection with unambiguous automata was 
considered later in [|]. Codes in Sturmian sets have been studied before in 
Finally, prefix codes X contained in restricted sets of words are used in to 
study the groups in the syntactic monoid of the submonoid X* . 

Our paper is organized as follows. 

In a first section (Section we recall some definitions concerning prefix- 
closed, factorial, recurrent and uniformly recurrent sets, in relation with infinite 
words. We also introduce probability distributions on these sets. 

In Section |^, we introduce prefix codes in factorial sets, especially maximal 
ones. We introduce some basic notions on automata. We define the average 
length with respect to a probability distribution on the factorial set. 

In Section^ we develop the theory of maximal bifix codes in recurrent sets. 
We generalize most of the properties known in the classical case. In particular, 
we show that the notion of degree and that of derived code can be defined 
(Theorem 4.3.1). We show that, for a uniformly recurrent set F, any F-thin 
bifix code contained in F is finite (Theorem 4.4.31 ). In the case of Sturmian 
sets, we prove our main results. First, a bifix code of degree d maxim al in a 
Sturmian set on a /c-letter alphabet has {k — l)d+ 1 elements (Theorem 5.2.1 ). 
Next, given an infinite word x, if there is a finite maximal bifix code X of 
degree d such that x has a t most d factors of length d m X, then x is ultimately 
periodic (Corollary 5.3.3 ). The proof uses the Critical Factorization Theorem 
(see e.g. [pS], |l6| ). 

Section y presents our results concerning free groups. Our main result (The- 
orem [5.2.1 ) in this area states that for a Sturmian set F, a bifix code X C F is 
a finite and i^-maximal bifix code of i^-degree d if and only if it is a basis of a 
subgroup of index d of the free group on A. We finally present in Section ^ a 
consequence of Theorem 6.2.1 concerning syntactic groups. We show that any 
transitive permutation group of degree d which can be generated by k elements 
is a syntactic group of a bifix code with (fc — l)d + 1 elements (Theorem 7.2.3). 

Many results of this paper are extensions or generalizations of results con- 
tained in 1^. We always give the reference of the corresponding result in 
The proofs sometimes consist in the verification that the proof of the book still 
holds in the more general setting, and sometimes require new and more involved 
developments. In order to make the paper self contained, and to avoid repetitive 
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164 references to the book, we have tried to always give complete proofs. 



2 Factorial sets 

In this section, we introduce the basic notions of prefix-closed, factorial, recur- 
rent and uniformly recurrent sets. These form a descending hierarchy. These 
notions are closely related with the analogous notions for infinite words which 



are defined in Section 2^ . In Section 2A , we introduce probability distributions 
on factorial sets. 

We use the standard terminology and notation on words, in particular con- 
cerning prefixes, suffixes and factors (see for example). Let A be a finite 
alphabet. All words considered below are supposed to be on the alphabet A. 
We denote by 1 the empty word. We denote by A* the set of all words on A 
and by A~^ the set of nonempty words. 

The reversal of a word w = 0102 • • • a„, where oi, 02, . . . , a„ are letters, is the 
word w — an ■ ■ ■ 0201. In particular, the reversal of the empty word is the empty 
word. A set X of words is closed under reversal if it contains the reversals of 
its elements. 

Given a set X of words, we define, for a word u, the set u^^X by 
u-^X ^ {y e A* \uy e X} . 

Next, we say that a word is a prefix of X if it is a prefix of a word of X. 

A nonempty set F C A* oi words is said to be prefix-closed if it contains 
the prefixes of all its elements. Symmetrically, it is said to be suffix-closed if it 
contains the suffixes of all its elements. It is said to be factorial if it contains 
the factors of all its elements. 

The right (resp. left) order of a word w with respect to F is the number of 
letters a such that wa S F (resp. aw E F). 

A set F is said to be right essential if it is prefix-closed and if any w € F 
has right order at least 1. If is right essential, then for any u E F and any 
integer n > 1, there is a word v of length n such that uv £ F. Symmetrically, 
a set F is said to be left essential if it is suffix-closed and if any w G F has left 
order at least 1. 



193 2.1 Recurrent sets 

194 A set F of words is said to be recurrent if it is factorial and if for every u,w E F 

195 there is a. v £ F such that uvw £ F. A recurrent set F 7^ {1} is right and left 

196 essential. 

197 Example 2.1.1 The set F ^ A* is recurrent. 

198 Example 2.1.2 Let A = {a, b}. Let F be the set of words on A without factor 

199 bb. Thus F = A* \ A*bbA*. The set F is recurrent. Indeed, ii u,w £ F, then 

200 uaw £ F. 
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A set F is said to be uniformly recurrent if it is factorial and right essential 
and if, for any word u € F, there exists an integer n > 1 such that u is a factor 
of every word in F n A" . 



204 Proposition 2.1.3 A uniformly recurrent set is recurrent. 

205 Proof. Let u,w G F. Let n be such that w is a factor of any word in F n A". 

206 Since F is right essential, there is a word v of length n such that uv € F. Since 

207 w is a factor of v, we have v — rws for some words r, s. Thus urw (z F. m 



The converse of Proposition 2.1.3 is not true as shown in the example below. 



Example 2.1.4 The set F = A* on A = {a, b} is recurrent but not uniformly 
recurrent since b G F but b is not a factor of a" G F for any n > 1. 



211 2.2 Recurrent words 

212 We denote by F{x) the set of factors of an infinite word x E A^. The set F{x) 

213 is factorial and right essential. 

214 An infinite word x € A^ is said to be recurrent if for any word u £ F{x) there 

215 is a D e F{x) such that uvu € F(x). Equivalently, each factor of a recurrent 

216 word X has an infinite number of occurrences in x. 



217 Proposition 2.2.1 For any recurrent set F there is an infinite word x such 

218 that F{x) ^ F. 

219 Proof. Set F = {ui,U2, ■ . .}. Since F is recurrent and ui, U2 G F, there is a word 

220 Vi such that uiViU2 € F. Further, since uiViU2,U3 G F there is a word V2 such 

221 that uiViU2V2Ut, e F. In this way, we obtain an infinite word x — U1V1U2V2 ■ ■ ■ 

222 such that F{x) — F. m 



223 Proposition 2.2.2 For any infinite word x, the set F(x) is recurrent if and 

224 only if X is recurrent. 

225 Proof. Set F = F{x). Suppose first that F is recurrent. For any u in F, there 

226 is a, V G F such that uvu € F. Thus x is recurrent. Conversely, assume that x 

227 is recurrent. Let u, v be in F. Then there is a factorization x — puy with p G F 

228 and y G A^^. Since x is recurrent, the word is a factor of y. Set y — qvz with 

229 q G F and z G A^ . Then uqv is in F. Thus F is recurrent. ■ 

230 An infinite word x G A^^ is said to be uniformly recurrent if the set -^(a;) 

231 is uniformly recurrent. There exist recurrent infinite words which are not uni- 

232 formly recurrent, as shown in the following example. 
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233 Example 2.2.3 Let a: be the infinite word obtained by concatenating all binary 

234 words in radix order: by increasing length, and for each length in lexicographic 

235 order. Thus, x starts as follows. 

X — ab aaabbabb aaaaababaabbbaababbbabbb ■ ■ ■ 

236 The infinite word x is recurrent since every factor occurs infinitely often. How- 

237 ever, x is not uniformly recurrent since each a", for n > 1, is a factor of thus 

238 two consecutive occurrences of say the letter b may be arbitrarily far one from 

239 each other. The word x is closely related to the Champernowne word | pT[ . 

240 We use indifferently the terms of morphism or substitution for a monoid 

241 morphism from A* into itself. Let / : A* — > A* be a morphism and assume 

242 there is a letter a ^ A such that /(a) € aA+. The words f^{a) for n > 1 are 

243 prefixes of one another. If |/"(a)| — >■ oo with n, then we denote by /"(a) the 

244 infinite word which has all /"(a) as prefixes. It is called a fix-point of /. 

245 Example 2.2.4 Set A = {a, 6}. The Thue-Morse morphism is the substitution 

246 f : A* A* defined by /(a) — ab and f{b) ~ ba. The Thue-Morse word 

247 X = abbabaab- ■ ■ is the fix-point f'^{a) of /. It is uniformly recurrent (see p6[ 

248 Example 1.5.10). We call Thue-Morse set the set of factors of the Thue-Morse 

249 word. 

250 An infinite word x € A^^ avoids a set X of words if F{x){^X — 0. We denote 

251 by Sx the set of infinite words avoiding a set X G A* . A (one sided) shift space 

252 is a set S of infinite words of the form Sx for some X A* . 

253 A shift space S C is minimal if for any shift space T C S, one has T — % 

254 or T = S. 

255 For any infinite word x € A^, we denote by S{x) the set of infinite words 

256 y E A^ such that F{y) C F{x). The set S{x) is a shift space. Indeed, we 

257 have y £ S{x) if and only if F{y) C F{x) or equivalcntly F{y) n X = for 

258 X^A*\F{x). 

259 The following property is standard (see for example [^6| Theorem 1.5.9). 

260 Proposition 2.2.5 An infinite word x G is uniformly recurrent if and only 

261 if S{x) is minimal. 

262 2.3 Episturmian words 

263 A Sturmian word is an infinite word a: on a binary alphabet A such that the set 

264 F{x) n A" has n + 1 elements for any n > 0. 

265 Example 2.3.1 Set A = {a,b}. The Fibonacci morphism is the substitution 

266 f : A* ~> A* defined by /(a) — ab and f(b) = a. The Fibonacci word 

X = abaababaabaababaababaabaababaabaab ■ ■ ■ 

267 is the fix-point f^{a) of /. It is a Sturmian word (see |3^] Example 2.1.1). We 

268 call Fibonacci set the set of factors of the Fibonacci word. 
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269 Episturmian words are an extension of Sturmian words to arbitrary finite 

270 alphabets. 

271 Recall that, given a set F of words over an alphabet A, the right (resp. 

272 left) order of a word u in is the number of letters a such that ua F (resp. 

273 au € F). A word u is right-special (resp. left-special) if its right order (resp. 

274 left order) is at least 2. A right-special (resp. left-special) word is strict if its 

275 right (resp. left) order is equal to Card(yl). In the case of a 2-letter alphabet, 

276 all special words are strict. 

277 By definition, an infinite word x is episturmian if F{x) is closed under re- 

278 versal and if F{x) contains, for each n > 1, at most one word of length n which 

279 is right-special. 

280 Since F{x) is closed under reversal, the reversal of a right-special factor of 

281 length n is left-special, and it is the only left-special factor of length n of a;. A 

282 suffix of a right-special factor is again right-special. Symmetrically, a prefix of 

283 a left-special factor is again left-special. 

284 As a particular case, a strict episturmian word is an episturmian word x 

285 with the two following properties: x has exactly one right-special factor of each 

286 length and moreover each right-special factor m of cc is strict, that is satisfies the 

287 inclusion uA C F{x) (see Q). 

288 It is easy to see that for a strict episturmian word x on an alphabet A with 

289 k letters, the set F{x) A"' has (fc — l)n + 1 elements for each n. Thus, for 

290 a binary alphabet, the strict episturmian words are just the Sturmian words, 

291 since a Sturmian word has one right-special factor for each length and its set of 

292 factors is closed under reversal. 

293 An episturmian word s is called standard if all its left-special factors are 

294 prefixes of s. For any episturmian word s, there is a standard one t such that 

295 F{s) — F{t). This is a rephrasing of Theorem 5 in [ p2[ . 

296 Example 2.3.2 Consider the following generalization of the Fibonacci word to 

297 the ternary alphabet A — {a, b, c}. Consider the morphism f : A* A* defined 

298 by /(a) = ab, f{b) — ac and /(c) — a. The fix-point 

f^ia) = abacabaabacababacabaabacabacabaabacab ■ ■ ■ 

299 is the Tribonacci word. It is a strict standard episturmian word (see [52]). 



300 
301 



The following is, in the case of Sturmian words. Proposition 2.1.25 in 
The general case results from Theorems 2 and 5 in |22] . 



302 Proposition 2.3.3 An episturmian word x is uniformly recurrent and S{x) is 

303 minimal. 

304 The converse is false as shown by the following example. 



305 Example 2.3.4 The Thue-Morse word of Example 2.2.4 is not Sturmian. In- 

306 deed, it has four factors of length 2. 
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We recall now some notions and properties concerning episturmian words. 
A detailed exposition with proofs is given in |3^, |o[ ^ . See also the survey 
paper For a £ A, denote by ipa the morphism of A* into itself, called 

elementary morphism, defined by 



Mb) 



ah if & 7^ a 
a otherwise 



311 Let '0 : A* — > End(j4*) be the morphism from A* into the monoid of endo- 

312 morphisms of A* which maps each a € A to V'a- For u g A*, we denote by 

313 V'ti the image of u by the morphism tp. Thus, for three words u,v,w, we have 

314 Ipuviw) = Mi^viw)). 

315 A palindrome is a word w which is equal to its reversal. Given a word w, we 

316 denote by 1/;^+^ the palindromic closure of w. It is, by definition, the shortest 

317 palindrome which has w as a prefix. 

318 The iterated palindromic closure of a word w is the word Pal(w) defined 

319 recursively as follows. One has Pal(l) = 1 and for u G A* and a G A, one has 

320 Pal(Ma) = (Pal(u)a) Since Pal(M) is aproper prefix of Pal(ua), it makes sense 

321 to define the iterated palindromic closure of an infinite word x as the infinite 

322 word which is the limit of the iterated palindromic closure of the prefixes of x. 

323 Justin's Formula is the following. For every words u and v, one has 

Pal(Mw) = V«(Pal(w)) Pal(u) . 

324 This formula extends to infinite words; if u is a word and v is an infinite word, 

325 then 

Pal(ww) = V«(Pal(w)). (2.1) 

326 There is a precise combinatorial description of standard episturmian words (see 

327 e.g. mi). 

328 Theorem 2.3.5 An infinite word s is a standard episturmian word if and only 

329 if there exists an infinite word A — aofli • • • , where the a„ are letters, such that 

s — lim Un , 

330 where the sequence (u„) is defined by Un ~ Pal(aoai • • • a„_i). Moreover, the 

331 word s is episturmian strict if and only if every letter appears infinitely often 

332 in A. 

333 The infinite word A is called the directive word of the standard word s. The 

334 description of the infinite word s can be rephrased by the equation 

s = Pal(A) . 

335 As a particular case of Justin's Formula, one has 

Un+l = M---a^-A'^n)Un ■ (2.2) 

336 The words w„ are the only prefixes of s which are palindromes. 
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Example 2.3.6 The Fibonacci word x of Example 2.3.1 is a standard epis 
turmian word. It has the directive word (a6)", that is cc = Pal((a&)") |26| 
The Tribonacci word of Example p. 3. 2 has the directive word A = (a6c)" |32| 



The corresponding sequence (u„) starts with ui = a, U2 — aba, U3 — abacaba. 
Observe that ^ab{c) = abac, so that indeed U3 = abacu2, as claimed in (|2.2|). 



Example 2.3.7 Let A — {a,b,c} and A = c{ab)'^. Then, we hav e ui — c, 



U2 = cac, U3 = cacbcac, U4 = cacbcacacbcac. By Justin's Formula 2.1, the limit 
is the word x — ipciv), where y — Pal((a5)") is the Fibonacci word on {a, 5}. 
This means that x is obtained from y by inserting a letter c before every letter 
of y. The word x is not strict. Indeed, the letters a and b are not right-special 
and the letter c is not strict right special since cc is not a factor. 



348 Example 2.3.8 Let A — {a, 6, c} and A = abc^ . It is easily checked that 

349 Pal(A) is the periodic word {abac)'^ . The only right-special factors of this word 

350 are 1 and a (p^). 



351 2.4 Probability distributions 

352 Let C A* be a prefix-closed set of words. For w £ F, denote by S{w) the 

353 set S{w) = {a (z A \ wa G F}. A right probability distribution on is a map 

354 TT : F — > [0, 1] such that 

355 (i) 7r(l) = 1, 

(ii) T,aes{w) 7r(wa) = 7r(u;), for any w £ F. 

357 For a right probability distribution tt on and a set X G F, we denote 

358 7r(X) — J2xex ''^i^)- @ f^'" elementary properties of right probabil- 

359 ity distributions. Note in particular that for any u £ F and n > 0, one has, as 

360 a consequence of condition (ii), 

ttM" n F) = 7r(u). (2.3) 

361 In particular, if tt is a right probability distribution on F , then t:[F n A") = 1 

362 for all n > 0. 

363 The distribution is said to be positive on F if t:{x) > for any x £ F. 

364 Symmetrically, for a suffix-closed set F, a left probability distribution is a 

365 map TT : F ^ [Oil] satisfying condition (i) above and 

366 (iii) J2aeP(w) 7r(aw) tt{w), for any w £ F, 

367 with P{w) = {a £ A \ aw £ F}. 

368 When F is factorial, an invariant probability distribution is both a left and 

369 a right probability distribution. 



370 Proposition 2.4.1 For any right essential set F of words, there exists a posi- 

371 tive right probability distribution tt on F. 
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372 Proof. Consider the map tt : F [0,1] defined for w — aia2 • • • a„ by 



Tr{w) 



dodi 



373 where di — Card(S'(ai • • • a^)) for < i < n. Since F is right essential, di 

374 for < i < n. By convention, 7r(l) = 1. 

375 Let us verify that tt is a right probabiUty distribution on F. Indeed, let 

376 w — 0102 • • • a„. The set S{w) is nonempty. Let a G >5'(u;), we have 7r(?x;a) = 

377 1/dodi ■ ■ ■ dn- Since Card(S'(u')) — d„, we obtain that tt satisfies condition (ii) 

378 and thus it is a right probabihty distribution. It is clearly positive. ■ 

379 We will now turn to the existence of positive invariant probability distribu- 

380 tions. 

381 A topological dynamical system is a pair (5*, a) of a compact metric space 

382 S and a continuous map a from S into S. Any shift space S becomes a topo- 

383 logical dynamical system when it is equipped with the shift map defined by 

384 a{xoXi • • • ) = X1X2 ■ ■ ■ ■ Indeed, we consider as a metric space for the dis- 

385 tance defined for x = xqXi ■ ■ ■ and y — j/oj/i • • • by d{x, y) — if x — y and 

386 d{x,y) — 2^" where n is the least integer such that a;„ 7^ y„ otherwise. 

387 A subset r of a topological dynamical system {S, a) is said to be stable under 

388 (7 or stable for short if cr(T) C T. A stable subset is also called (topologically) 

389 invariant. 

390 The following property is well-known (although usually stated for two sided- 

391 infinite words, see for example Proposition 1.5.1 in [|36|). 

392 Proposition 2.4.2 The shift spaces are the stable and closed subsets of {A^ , a). 

393 Proof. It is clear that a shift space is both closed and stable. Conversely, let 

394 S C be closed and stable under the shift. Let X be the set of words which 

395 are not factors of words of S. Then S = Sx- Indeed, if y G 5, then F{y){^X — ^ 

396 and thus y G Sx ■ Conversely, let y G Sx ■ Let w„ be the prefix of length n of 

397 y. Since Wn ^ X there is an infinite word y^") G S such that w„ G F{y^"-^). 

398 Since S is stable under the shift, we may assume that Wn is a prefix of y^"). 

399 The sequence j/^"^ converges to y. Since S is closed, this forces y & S. m 

400 Let S* be a metric space. The family of Borel subsets of S is the smallest 

401 family J- of subsets of S containing the open sets and closed under complement 

402 and countable union. A function fi from to R is said to be countably additive 

403 if /x(lJ„->Q Xn) — J2n>o f-i-^n) for any sequence (X„) of pairwise disjoint Borel 

404 subsets of S. A Borel probability measure on 5' is a function /i from T into [0, 1] 

405 which is countably additive and such that ^(5) = 1. 

406 Let (5, a) be a topological dynamical system. A Borel probability measure 

407 on S is said to be invariant if ^{a~^{B)) = for any B G J^. Note that 

408 since a is continuous, a'^{B) G T and thus fi{a^^{B)) is well defined. 

409 The following result is from |^, Theorem 4.2]. 
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410 Theorem 2.4.3 For any topological dynamical system, there exist invariant 

411 Borel probability measures. 

ill A dynamical system (S*, a) is said to be minimal if the only closed stable 



subsets of S are S and 0. Note that, by Proposition 2.4.2 , this definition is 
consistent with the definition of a minimal shift space. A Borel probability 
measure fi on S is positive if /i(C/) > for every nonempty open set U C S. 

Proposition 2.4.4 Any invariant Borel probability measure on a minimal topo- 
logical dynamical system is positive. 

Proof. Let fj, be an invariant Borel probability measure on the topological 
dynamical system {S,a). Let U C 5 be a nonempty open set. Let Y = 
Un>oc "(C^) and Z = S\Y. Since U is open and a is continuous, each a^"{U) 
is open. Thus Y is open and Z is closed. The set Z is also stable. Indeed, if 
for z e Z we had a{z) ^ Z, then there would be an integer n > such that 
(t{z) e cr-"([/). Thus z £ cr-"-i([/) C Y , a contradiction. Thus a{Z) C Z. 
Since {S, a) is minimal, this implies that Z = (b ot Z = S. Since U is nonempty, 
we have Z = and thus Y — S. Since /i is invariant, we have fj,{a^^{U)) = jJ^iU) 
and thus /i((T^"(f7)) = /i(?7) for all n > 0. Hence we cannot have ^{U) = since 
it would imply fi{S) < X]n>o ^C"" "(^)) = 0, a contradiction since ^{S) = 1. 



429 
430 
431 

432 
433 
434 
435 
436 
437 
438 
439 



Corollary 2.4.5 For any recurrent set F there exists an invariant probabil- 
ity distribution on F. When F is uniformly recurrent, such a distribution is 
positive. 



Proof. Let _F be a recurrent set. By Proposition 2.2.1 there is a recurrent infinite 
word X such that F(x) — F, and if F is uniformly recurrent, then x is uniformly 
recurrent. 



By Theorem 2.4.3 there is an invariant Borel probability measure fi on S = 
Six). 

Let TT be the map from F to [0, 1] defined by tt{w) — fi{wA^^r\S). Let us verify 
that TT is an invariant probability distribution. Indeed, one has 7r(l) = fJ,{S) = 1. 
Next, for w £ F 



In the same way 



wa) 



^ liiwaA^^nS) = fiiwA^^nS)^TT{w). 

aeS{w) 



aeP(to) aeP{w) 



441 
442 
443 
444 



If X is uniformly recurrent, by Proposition 2.2.5, the shift space S = S{x) is 
minimal. By Proposition 2.4.4, the measure /i is positive. Since wA^ n 5' is a 



nonempty open set for any w £ F, we have Tr{w) = fi{wA n 5) > and thus tt 
is positive. ■ 
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In some cases, there exists a unique invariant probability distribution on the 
set F. A morphism f : A* ^ A* is primitive if there exists an integer k such 
that, for all a,b & A, the letter b appears in f^(a). If / is a primitive morphism 
and if /(a) starts with the letter o for some a S A, then x = /"(a) is a fix-point 
of / and there is a unique invariant probability distribution ttj? on the set F{x) 
(]4^, Theorem 5.6]). Moreover, this distribution is positive. We illustrate this 
result by the following examples. 



452 Example 2.4.6 Let F be the Fibonacci set (see Example 2.3.1 ). Since the 

453 morphism / defined by /(a) = ab and f{b) = a is primitive, there is a unique 

454 invariant probability distribution on F . Its values on the words of length at 
most 4 are shown on Figure ^ with A = (^5 - l)/2. The values of TTp can be 



(2A^T>-^^<2A^T)-'^-(2A^ 




2A - l)— %^2A - l) 
^3A>-^ 2 - 3A) 



455 
456 



Figure 2.1: The invariant probability distribution on the Fibonacci set. 



obtained as follows (see |^). The vector v = \TTF{a) i^F{b)\ is an eigenvector 

457 for the eigenvalue 1/A of the A x A-matrix M defined by Mab = |/(a)|b- Here, 

458 we have 

'l l" 



M 



1 



459 This implies u = [A 1 — A] . The other values can be computed using condi- 

460 tions (ii) and (iii) of the definition of an invariant probability distribution. 



461 Example 2.4.7 Let F be the Thue-Morse set (see Example 2.2.4). Since the 

462 Thuc-Morsc morphism is primitive, there is a unique invariant probability dis- 

463 tribution on F . Its values on the words of length at most 4 are shown on 



464 Figure 2.2 



3 Prefix codes in factorial sets 



466 In this section, we study prefix codes in a factorial set. We will see that most 

467 properties known in the usual case are also true in this more general situation. 
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Figure 2.2: The invariant probability distribution on the Thue-Morse set. 



Some of them are even true in the more general case of a prefix-closed set instead 
of a factorial set. In particular, this holds for the link between prefix codes and 
probability distributions (Proposition 3.3.4). 

Recall that a set X C of nonempty words over an alphabet A is a code 
if the relation 



with n,m > 1 and xi, . . . , a;„, j/i, . . . , t/m € X implies n 
i — I, . . . ,n. For the general theory of codes, see 0. 



m and Xj 



Vi for 



475 3.1 Prefix codes 

476 The prefix order is defined, for u,v £ A* ^ hy u < v ii u is a prefix of v. Two 

477 words u, V are prefix- comparable if one is a prefix of the other. Thus u and v 

478 are prefix-comparable if and only if there are words x, y such that ux = vy or, 

479 equivalently, if and only if uA* D vA* ^ 0. The suffix order, and the notion of 

480 suffix-comparable words, are defined symmetrically. 

481 A set X C of nonempty words is a prefix code if any two distinct elements 

482 of X arc incomparable for the prefix order. A prefix code is a code. 

483 The dual notion of a suffix code is defined symmetrically with respect to the 

484 suffix order. 

485 The submonoid M generated by a prefix code satisfies the following property: 

486 if u, uv £ M then v € M . Such a submonoid of A* is said to be right unitary. 

487 One can show that conversely, any right unitary submonoid of A* is generated 

488 by a prefix code (see j^]). The symmetric notion of a left unitary submonoid is 

489 defined by the condition v,uv € M implies u e M . 

490 We denote by 2L the characteristic series of a set X d A*. By definition. 
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491 for any x £ A* , 

ix,.) = l' ^^^^^ 

I otherwise 

492 The following is Proposition 3.1.6 in Q. 

493 Proposition 3.1.1 Let X be a prefix code and let U — A* \ XA* . Then 

Al^X*U and X-1^U{A-1). (3.1) 

494 3.2 Automata 

495 We recall the basic results on deterministic automata and prefix codes (see 

496 for a more detailed exposition). 

497 We denote A — {Q, i, T) a deterministic automaton with Q as set of states, 

498 i Cz Q as initial state and T G Q as set of terminal states. For p G Q and 

499 w Cz A* , we denote p ■ w — q ii there is a path labeled w from p to the state q 

500 and p ■ w = $ otherwise. 

501 The set recognized by the automaton is the set of words w A* such that 

502 i ■ w £ T. A set of words is rational if is recognized by a finite automaton. 

503 All automata considered in this paper are deterministic and we call them 

504 simply automata. 

505 The automaton A is trim if for any q £ Q, there is a path from i to q and a 

506 path from q to some t € T. 

507 An automaton is called simple if it is trim and if it has a unique terminal 

508 state which coincides with the initial state. 

509 An automaton A — {Q, i, T) is complete if for any state p £ Q and any letter 

510 a Cz A, one has p ■ a ^ ^. 

511 For a set X (Z A* , we denote by A{X) the minimal automaton of X. The 

512 states of A{X) are the nonempty sets u~^X = {v £ A* \ uv £ X} for u £ A* . 

513 The initial state is the set X and the terminal states are the sets u~^X for 

514 U £ X . 

515 Let X (Z A* he& prefix code. Then there is a simple automaton A = {Q, 1, 1) 

516 that recognizes X*. Moreover, the minimal automaton oi X* is simple. 

517 Let X be a prefix code and let P be the set of proper prefixes of X. The 

518 literal automaton of X* is the simple automaton A — {P, 1, 1) with transitions 

519 defined for p G P and a G Ahy 

{pa ii pa E P , 
1 ii pa G X , 
otherwise. 

520 One verifies that this automaton recognizes X*. 

521 Let A — {Q,i,T) be an automaton. For w G A*, we denote ip^(w) the 

522 partial map from Q to Q defined by pLpj^{w) = q it p ■ w — q. The transition 

523 monoid of A is the monoid of partial maps from Q to Q of the form ip^(w) for 

524 W G A*. 
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525 3.3 Maximal prefix codes 

526 Let F he a subset of A*. A set X C A* is right dense in F C A* , or right 

527 .F-dense, if any u € F is a prefix of X. 

528 A set X C is right complete in F, or right i^-complete, if X* is right dense 

529 in F, that is if every word in is a prefix of A"*. 

530 A prefix code A C is maximal in F, or i<"-inaximal, if it is not properly 

531 contained in any other prefix code Y (Z F. The notion of an _F-maximal suffix 

532 code is symmetricaL 

533 The following propositions are extensions of Propositions 3.3.1 and 3.3.2, 

534 and of Theorem 3.3.5 in 

535 Proposition 3.3.1 Let F be a subset of A* . For any prefix code X C F, the 

536 following conditions are equivalent. 

537 (i) Every element of F is prefix- comparable with some element of X , 

538 (ii) X is an F -maximal prefix code. 

539 Proof, (i) implies (ii) . Any word u € F is prefix-comparable with some word of 

540 X. This implies that ii u ^ X, then A U u is no longer a prefix code. Thus X 

541 is an _F-maximal prefix code. 

542 (ii) implies (i). Assume that u € F is not prefix-comparable to any word in 

543 X. Then A U m is prefix, and A is not an i^-maximal prefix code. ■ 

544 Proposition 3.3.2 Let F be a factorial subset of A* . For any set X C F of 

545 nonempty words, the following conditions are equivalent. 

546 (i) Every element of F is prefix- comparable with some element of X , 

547 (ii) XA* is right F-dense, 

548 (iii) A is right F-complete. 

549 Proof, (i) implies (ii). Let u € F. Let x € X he prefix-comparable with u. 

550 Then there exist v,w such that uv — xw. Thus XA* is right F-dense. 

551 (ii) implies (iii). Consider a word u d F. Let us show that m is a prefix of 

552 A*. Since XA* is right F-dense, one has uw = xw' for some word x G X and 

553 w,w' G A*. If M is a prefix of A, there is nothing to prove. Otherwise, a; is a 

554 proper prefix of u. Thus u — xu' for some a: e A and u' £ A*. Since u is in 

555 F and since F is factorial, we have u' G F. Since a; ^ 1, we have \u'\ < \u\. 

556 Arguing by induction, the word u' is a prefix of A*. Thus m is a prefix of A*. 

557 (iii) implies (i). Let u G F. Then u is a prefix of A*, and consequently u is 

558 prefix-comparable with a word in A. 

559 ■ 

560 The propositions have a dual formulation, replacing prefix by suffix, and 

561 right by left. 

562 Example 3.3.3 The set A = {a, ba} is a maximal prefix code in the Fibonacci 

563 set F since XA* is right F-dense. 
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564 The following is a generalization of Propositions 3.7.1 and 3.7.2 in [||. 

565 Proposition 3.3.4 Let F be a right essential set. Let n be a positive right 

566 probability distribution on F. Any prefix code X <Z F satisfies 7r(X) < 1. If X 

567 is finite, it is F -maximal if and only if n{X) = 1. 



568 
569 



Proof. Assume first that X is finite. Let n be the maximal length of the words 
in X. We have 

y xA"^!^! DFcA'TiF (3.2) 



570 and the terms of the union are pairwise disjoint. Thus, using Equation (2.3) 

Tr{X) ^ T^M""'"' n F) < ^(A" n F) = 1 . (3.3) 

xex 

571 If X is maximal in F, any word in _F H A" has a prefix in X . Thus we have 

572 equality in (3.2) and thus also in ( ^.3[ ). This shows that tt{X) = 1. The converse 

573 is clear since tt is positive on F. 

574 If X is infinite, then Tr{Y) < 1 for any finite subset Y of X. Thus 7r(X) < 1. 



575 



576 The statement has a dual for a suffix code included in a factorial set F with 

577 a positive left probability distribution on F. 



578 
579 
580 



Example 3.3.5 Let F be the Fibonacci set. The set X = {a, ba} is a maxi- 
mal prefix code (Example B.3.3). One has irpiX) = 1 where ttf is defined in 



Example 2.4.6 



581 We will use the following result in the proof of Proposition 4.4. 5| . 

582 Proposition 3.3.6 Let F be a right essential subset of A* , and let G d F be 

a right essential subset of F. For any finite F-maximal prefix code X <Z F , the 
set X n G is a finite G-maximal prefix code. 



583 
584 



585 Proof. Set Y = X O G. The set Y is clearly a finite prefix code. We show that 

586 every u G G is prefix-comparable with some word in Y. This will imply that Y 

587 is G-maximal by Proposition 3.3.1. Let u G G. Since G is right essential, there 

588 are arbitrary long words w such that uw e G. Choose the length of uw larger 

589 than the maximal length of the words of X. Since X is an i^- maximal prefix 

590 code, uw has a prefix x in X. This prefix a; is in y since uw G G. Thus u is 

591 prefix-comparable to x Cz Y . m 

592 The following example shows that Proposition [3.3.6 is false for infinite prefix 

593 codes. 



594 Example 3.3.7 Let F C A* he a. right essential set with F A*, and let x be 

595 a word which is not in F. Let X = A*x \ A*xA'^ be the prefix code of words in 

596 A* ending with x and having no other occurrence of x. X is a maximal prefix 

597 code, and X n F = is not F-maximal. 
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612 



613 
614 



616 



619 
620 



626 



We will use later the following result on transformations of prefix codes. It 
is adapted from Proposition 3.4.9 in ||]. 

Proposition 3.3.8 Let F be a factorial set and let X G F be an F -maximal 
prefix code. Let w be a nonempty prefix of X and set D — w^^X. The set 
Y = {X \ wD) U w is an F-maximal prefix code. 

Proof. It is clear that y is a prefix code. To show that it is _F-maximal, we 



apply Proposition 3.3.1 and prove that every word u G _F is prefix-comparable 
with a word of Y. So consider a word u € F. Since X is _F-maximal, u is prefix- 
comparable with a word of X. Thus u is prefix-comparable with a word of 
X\wD or it is prefix-comparable with a word of wD. In the second case, either 
u is a prefix of a word wd with d G D or u has wd as a prefix. Consequently, 
u is prefix-comparable with w. This proves that u is prefix-comparable with a 
word of ■ 



Proposition 3.3.8 has a dual formulation for suffix codes. 



3.4 Average length 



Let F be a right essential set and let tt be a right probability distribution on F. 
Let X G F he a, prefix code such that 7r(X) — 1. The average length of X with 
615 respect to tt is the sum 



\iX) = V |a:|7r(a;) 



xex 



Proposition 3.4.1 Let F be a right essential set and let n be a positive right 

617 probability distribution on F. Let X G F be a finite F-maximal prefix code and 

618 let P be the set of proper prefixes of X . Then tt{X) — 1 and X{X) = 7r(P). 



Proof. We already know that 7r{X) = 1 by Proposition 3.3.4. Let us show that 
for any p € P, 

7r{p)= ^(^)- (3.4) 

xepA+nx 

621 Let indeed n be an integer larger than the lengths of the words of X. Then by 



622 Equation (2^), Tr{p) = Tr{pA" f] F). Since X is an F-maximal prefix code, each 

623 word of pA'^ n F has a prefix in X , and conversely, each word in X which has 

624 p as a prefix is itself a prefix of pA" n F. Thus 

pAT\F= y a;A"+IPl-l^l nF. 
x£pA+nx 



625 Since n(xA"^'^^' D F) — tt{x), this proves Equation (3.4) 
By Equation (|3.4| ), one gets 



peP pePxepA+nx xex peP:xepA+ xex 
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627 Thus 



7r(P) = ^vr(p) = ^|xKx)=A(X). 



A dual statement of Proposition 3.4.1 holds for a suffix code and its set of 



proper suffixes, for a positive left probability distribution. 



Example 3.4.2 Let F be the Fibonacci set and let X = {a,ba}. We have 
already seen in Example 3.3.5 that X is an i^-maximal prefix code and that 



TTp^X) — 1 where tt^? is the unique invariant probability distribution on F 
defined in Example ^A.6\ We have X{X) = A + 2(1 - A) = 2 - A. On the other 
hand the set of proper prefixes of X is P = {1, b} and thus TrpiP) = l + Cl^-^) = 
2- A. 



637 4 Bifix codes in recurrent sets 

638 In this section, we study bifix codes contained in a recurrent set. Since A* itself 

639 is a recurrent set, it is a generalization of the usual situation. We will see that 

640 all results on maximal bifix codes can be generalized in this way. In particular, 

641 the notions of degree, of kernel and of derived code can be defined in this more 

642 general framework. 

643 4.1 Parses 

644 Recall that a set X of nonempty words is a bifix code if any two distinct elements 

645 of X are incomparable for the prefix order and for the suffix order. 

646 A parse of a word w with respect to a set X is a triple {v,x,u) such that 

647 w = vxu with V <E A*\ A*X, x e X* and u e A* \ XA*. 

648 Proposition 4.1.1 Let F be a factorial set and let X d F be a set. For any 

649 factorization w = uv of w ^ F , there is a parse {s,yz,p) of w with y,z X* , 

650 sy — u and v — zp. 

651 Proof. Since w e F, there exist, by Proposition ^.l.l| , words z e X* and 

652 p €z A* \ XA* such that v — zp. Symmetrically, there exist y G X* and 

653 s £ A* \ A* X such that u = sy. Then (s, yz,p) is a parse of w which satisfies 

654 the conditions of the statement. ■ 

655 The number of parses of a word w with respect to X is denoted by 5x{w). 

656 The function ^jf : A* — >■ N is the parse enumerator with respect to X . 

657 The indicator of a set X is the series Lx defined for w G A* by {Lx , w) = 

658 5x{w). 

659 Example 4.1.2 Let AT = 0. Then 5x{w) = \w\ + 1. 
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660 The following is a reformulation of Proposition 6.1.6 in 

661 Proposition 4.1.3 Let F be a factorial set and let X d F be a prefix code. 

662 For every word w d F, the number 6x{w) is equal to the number of prefixes of 

663 w which have no suffix in X. 

664 Proof. For every prefix v of w which is in A* \ A*X, there is a unique parse of 

665 w of the form {v,x,u). Since any parse is obtained in this way, the statement 

666 is proved. ■ 



667 Proposition 4.1.3 has a dual statement for suffix codes. 

668 Note that, as a consequence of Proposition 4.1.3, we have for two prefix 

669 codes X, Y, and for all words w, 

X CY ^ Sriw) <Sxiw). (4.1) 

670 Indeed, a word without suffix in Y is also a word without suffix in X. 

671 Proposition 4.1.4 Let X be a prefix code and let V = A* \ A*X. Then 

V = Lx{l-A). (4.2) 

672 // X is bifix, one has 

l-X=il-A)Lxil-A). (4.3) 



673 
674 
675 
676 
677 
678 



Proof Set L ~ Lx- Let U = A* \ XA* . By definition of the indicator, we 
have L — V_ X*IJ_ . Since X is prefix, we have by Proposition |3.1.1| , the equality 
A* = X*U_. Thu s we obtain L — V_A*_ (note that this is actually equivalent to 
Proposition 4.1.3| ). Multiplying both sides on the right by {1 — A), we obtain 
Equation ([4.2|). 

If X is suffix, we have by the dual of Proposition ^.l.l| , the equality 1 — X = 

679 i^^A)]^. This gives Equation (4.3) by multiplying both sides of Equation (4.2) 

680 on the left by 1 — -A . ■ 

681 The following is Proposition 6.1.11 in Q. 

682 Proposition 4.1.5 A function S : A* ^ N is the parse enumerator of some 

683 bifix code if and only if it satisfies the following conditions. 

684 (i) For any a £ A and w £ A* 

< 6iaw) - 6iw) < 1 . (4.4) 

685 (ii) For any w £ A* and a £ A 

< 6{wa) - 6{w) < 1 . (4.5) 

686 (iii) For any a,b £ A and w £ A* 

S{aw) + S{wb) > 5{w) + S{awb) . (4.6) 
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687 (iv) (5(1) = 1 . 

688 The following is a reformulation of Proposition 6.1.12 in [^. 

689 Proposition 4.1.6 Let X be a prefix code. For any u £ A* and a ^ A, one 

has 

c ( N \5x{u) ifuaeA*X 

Sx{ua) = <^ (4.7) 
I dx(u) + 1 otherwise 



690 



693 
694 



691 Proof. This follows directly from Proposition 4.1.3 



692 Proposition 4.1.6| has a dual for suffix codes expressing Sx{au) in terms of Sxiu). 



Recall also that by Proposition 6.1.8 in |6|, for a bifix code X and for all 
u,v,w ^ F such that uvw G F, one has 



Sx{v) < 5x{uvw). (4.8) 

695 Moreover, if uvw e X and 7i, w G A'^ then the inequality is strict, that is, 

5x{v) < 5x{uvw). (4.9) 

696 4.2 Maximal bifix codes 

697 Let F be set of words. A set X C F is said to be thin in F, or F-thin, if there 

698 exists a word of F which is not a factor of a word in X. 

699 The following example shows that there exist a uniformly recurrent set F, 

700 and a bifix code X C F which is not F-thin. 

701 Example 4.2.1 Let F be the Thue-Morse set, which is the set of factors of 

702 a fix-point of the substitution / defined by /(a) — ab, f{b) — ba (see Exam- 



ple 2.2.4). Set Xn = f"^{o) for n > 1. Note that Xn+i = XnXn where u — >■ tt is 
the substitution defined hy a — b and b = a. Note also that u G F ii and only 
if u e F. Consider the set X = {xnXn \ n > 1}. We have X C F. Indeed, 
for n > 1, Xn+2 = a^n+i^n+i = XnXnXnXn implies that XnXn € F and thus 
XnXn G F. Next X is a bifix code. Indeed, for n < m, Xm begins with XnXn, 
and thus cannot have as a prefix. Similarly, since Xm ends with a;„x„ or 

with XnXn, it cannot have XnXn as a suffix. Finally any element of F is a factor 
of a word in X. Indeed, any element u of F is a factor of some a;„, and thus of 

^n^n ^ • 



A simpler proof uses Theorem 4.4.3 proved later 



713 An internal factor of a word a; is a word v such that x — uvw with u, w 

714 nonempty. Let F C A* be a factorial set and let X C F be a set. Denote by 

I{X) ^{weA*\ A+wA+ n X 7^ 0} 

715 the set of internal factors of words in X^ 

iThe set /(X) is denoted by //(X) in 
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716 When F is right essential and left essential, then X is F-thin if and only if 

717 F \ I{X) ^ 0. Indeed, the condition is necessary. Conversely, if w is in F \ I{X), 

718 let a, 6 e A he such that awb G F. Since awb cannot be a factor of a word in 

719 X, it follows that X is F-thin. 

720 We say that a bifix code X G F is maximal in F, or _F-maximal, if it is not 

721 properly contained in any other bifix code Y C F. 

722 The following is a generalisation of Proposition 6.2.1 in Q. 

723 Theorem 4.2.2 Let F be a recurrent set and let X d F be an F-thin set. The 

724 following conditions are equivalent. 

725 (i) X is an F-maximal bifix code. 

726 (ii) X is a left F-complete prefix code. 

727 (ii') X is a right F-complete suffix code. 

728 (iii) X is an F-maximal prefix code and an F-maximal suffix code. 



As a preparation for the proof of Theorem 4.2.2| , we introduce the following 



notation. Let F be a recurrent set and let X d F. 

A factorization of a word it is a pair (p, s) of words such that u — ps. We 
denote by Fact(M) the set of factorizations of u. 

Let C{X,F) be the set of pairs {u,v) of words such that uvu ^ F, v 1 
and u is not an internal factor of X. We define for each pair {u, v) £ C{X, F) a 
relation ipu,v on the set Fact(u) as follows. For tt — (j),s),p = {q^t) € Fact(M), 
one has (tt, p) € ipu,v if and only if the pair (tt, p) satisfies one of the following 



conditions (see Figure 4.1) 



(i) px — q for some x € X, 

(ii) svq — xi ■ ■ ■ Xn with n > 1 and Xi d X for 1 < i < n, s is a proper prefix 
of xi and g is a proper suffix of a;„. 

Since ps = qt, the condition (i) is equivalent to s = xt. This means that both 
conditions are symmetric for reading from left to right or from right to left. 



X2 ' ' ' Xn—1 X'f^ 




o ^^^-<y^ ^-^-cy-^ o 

uvu 

Figure 4.1: The relation if^^v (case (ii)). 

742 

743 We prove a series of lemmas concerning the relations ^Puv (see Exercise 6.2.1 

744 in§). 

745 Lemma 4.2.3 Let F be a recurrent set and let X d F be an F-thin set. If X 

746 is a prefix code, then for all pairs (u, v) G C'{X, F), the relation (p„_-u is a partial 

747 function from Fact(w) into itself, that is 

{tt,p),{tt,p') e(pu,y ^ p = p'. (4.10) 



748 Conversely, if X is an F-maximal suffix code, and if (4.10) holds for all pairs 

749 {u,v) G C{X,F), then X is a prefix code. 
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Define the transpose ip'^ „ of the relation ipu,v by the condition (p, tt) S ip'^^ ^ if 



751 (tt, p) € ipu,v Then (4.10) expresses the fact that the transpose p'^ ^ is injective. 



Proof. Assume first that X is a prefix code. For {u,v) £ C{X,F), let tt = 
(p, s), p — {q, t), p' = [q' , t') be three factorizations of u such that (tt, p), (tt, p') G 
fu,v We prove that p = p' . By definition, the following cases may occur for 
(7r,p), (7r,p') . 

(1) px = q and px' = q' , with x, x' G X, 

(2) px = q with a; G X, and swg' = x'j^ • • • x^, with m > 1 and a;']^, . . . , G X, 
and moreover s is a proper prefix of x'^ and g' is a proper suffix of x'^, 

(3) px' = with x' d X and sf(7 = .xi • • • with n > 1 and xi, . . . , a;„ G X, 
and moreover s is a proper prefix of xi and g is a proper sufhx of x. 

(4) svq = Xi ■ ■ ■ Xn and svq' — x'l ■ ■ ■ a;^, with n > 1, m > 1, xi, . . . , a;, 
a;^, . . . , x'j-^ G X, and moreover s is a proper prefix both of a;i and of x'l, q 
is a proper suffix of a;„ and q' is a proper suffix of x'^. 

(1) Assume that px — q, px' — g', with x, x' G X. Since q and q' are prefixes 
of u, they are prefix-comparable. Thus x and a:' are also prefix-comparable. 
Since X is a prefix code, it follows that x — x' , whence q — q' and p — p' . 

(2) We show that this case is impossible. Indeed, a; is a prefix of s (by 
ps = qt ^ pxt) and s is a proper prefix of x'l, thus a: is a proper prefix of x'l, 
and this is impossible because X is a prefix code. The same argument holds in 
the symmetric case (3). 

(4) Since u ^ qt = q't' , the words q and q' are prefix-comparable. We 



may suppose that q = q'w (see Figure 4.2). Since svq, svq' are in X* and 
X is a prefix code, we have w G X*. Since X is a code, the decompositions 
svq — xi ■ ■ ■ Xn — svq'w — x'l ■ ■ ■ x'„^w coincide. Consequently, w = a;„j+i • ■ ■ a;„. 
By hypothesis, q = q'w = q'x^+i ■ ■ ■ Xn is a proper suffix of a;„. This forces 
n = m, w = 1 and q = q' , hence p = p'. 




777 
778 



Figure 4.2: The factorizations (p, s), {q,t) and {q',t') with t' = wt. 

Conversely, assume that X is an i^-maximal suffix code and that it is not a 
prefix code. Let x' , x" be distinct words in X such that x' is a prefix of x" . Set 
x" = x'r' with r' 7^ 1. 

Since X is F-thin, there is a word w ^ F \ I{X). Since F is recurrent, there 
is a word r" such that x"r"w G F. Let u = r'r"w. Then x"r"w = a;'u G F. 
Let i be a word such that utx'u G F. Set w = tx' . Thus (u, w) G C{X, F) (see 
Figure ||^). By the dual of Equation (|t|), there exist p G A* \yl*A: and z G X* 
such that ut = pz. 
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786 Since X is left F-complete, p is a proper sufRx of a word in X. Since 

787 u ^ I{X), p is a prefix of u. Thus z — 1 — toizG X+. In the latter case, set 

788 z — zi ■ ■ ■ Zn with Zi e X. Since ut — pz, one of the following two cases holds: 

789 (1) u = pz' , with z' , t G X* , 

790 ( 2) there is an i with 1 < i < n such that Zi = rs with u = pzi ■ ■ ■ Zi_ir, 

791 t = szi+i ■ ■ ■ Zn, and r ^ 1, s ^ 1. 

792 In case (1), consider the three factorizations vr — (u, 1), p — p' = 

793 {r',r"w) of u. Since r' ^ 1, we have p ^ p' . We have v — tx' E X+, and 

794 thus (vr,/?) G ^Pu,v (this is case (ii) of the definition with s = q = I). Next, 

795 vr' — tx'r' — tx" G X^, with t G X* and where r' is a proper suffix of x" . 

796 Hence (tt,/?') G i^au.u- Thus, (pu,v is not a partial function. 

797 In case (2), let tt = {pzi ■ ■ ■ Zi-i,r) and let p,p' be as above. We have 

798 rv — rtx' = rszi^i ■ ■ ■ Znx' G whence (t^,p) G i^au.u- Next, rvr' = 

799 rszi+i • • • Znx'r' = rszi-^-i ■ ■ ■ Znx" G X^ , and r' is a proper suffix of x" . Thus 

800 {t^,p') G V3u,ti- Since p ^ p' , ipu^v is not a partial function. ■ 
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Figure 4.3: ipu,v is not a partial function. 



Lemma 1.2.5 has a dual formulation for suffix codes: if X is a suffix code, 
then for all pairs (u, v) G C{X, F), the relation ipu,v is injective: if (tt, p), (tt', p) G 
iPu,v J then n — t:' . Conversely, if X is an i^-maximal prefix code, and if this 
implication holds for all pairs {u, v) G C{X, F), then X is a suffix code. 

Recall that a set X C F is right i^-complete if any word of F is a prefix 
of X*. 



807 Lemma 4.2.4 Let F be a recurrent set and let X d F be an F-thin set. The 

808 set X is right F-complete if and only if, for all pairs {u,v) G C{X,F), the 

809 relation Lpu,v contains a total function from Fact(u) into itself, that is for every 

810 TT G Fact(w), th ere exists p G Fact(ii) such that {ji^p^ G <^u,d- 

811 Proof. Assume first that X is right i^-complete. Let u, w G F be such that 

812 {u,v) G C{X,F). Let TT = {p,s) G Fact(M). Suppose first that s has a prefix 

813 X in X. Let s = xt, with x € X. Thus u — ps = pxt. Let q = px and 

814 p = {q,t). Then (7r,p) G fu,v Suppose next that s has no prefix in X. Since 

815 X is right F-complete, there exists a word w such that svuw = xi ■ ■ ■ Xm, with 

816 Xl , . . . , Xj^i G X. 

817 Let n be the smallest integer such that sv is a prefix of a:i • • • x„, 1 < 't- < 'ti. 

818 Let q be the prefix of uw such that swg — xi ■ ■ ■ Xn- Since sw ^ 1, q is a proper 
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suffix of Xn- The word 5 is a prefix of u since u is not an internal factor of X. 
Define the factorization p = iq,t) of u by svq = xi ■ ■ ■ Xn- Since s has no prefix 
in X, the word s is a proper prefix of xi. Therefore, (7r,p) e ^u,v This shows 
that 'fu,v contains a total function. 

Conversely, assume that for all G C(X, i^), the relation Lpu,v contains 

a total function from Fact(M) into itself. We show that any u & F \s prefix- 
comparable with a word of X. By Proposition ^.3.2| , this implies that X is right 
i^-complete. 

Let u E F. Since X is i^-thin, the set F\I{X) is nonempty. Let w E F\I{X) 
and let v be such that uvw G F. Set r = uvw. Note that r £ F\I{X). Let z ^ 1 
be such that rzr G F. Then (r, z) G C(X, i^). Set tt = (1, r). Since (pr,z contains 
a total function, there is a factorization p — {q,t) of r such that {tt,p) G ^r,z- 
If g G X, then r has the prefix g in X, the word u is prefix-comparable with 
q, and we obtain the conclusion. Otherwise, we have uvwzq = xi ■ ■ ■ Xn with 
Xi £ X and uvw is a prefix of xi, whence our conclusion again. ■ 



Lemma 4.2.4 has a dual formulation for left F-complete sets: the set X is 
left F-complcte if and only if, for all pairs {u,v) G C{X,F), the transpose of 
the relation ipu,v contains a total function from Fact(u) into itself. 

Proposition 4.2.5 Let F be a recurrent set and let X C F be an F-thin and F- 
maximal prefix code. Then X is a suffix code if and only if it is left F-complete. 



Proof. Since X is an _F-maximal prefix code, by Lemmas 4.2.3 and 4.2.4| for 



any pair (m, v) G C(X, F), the relation ip^.v is a total function from Fact(M) into 
itself. 



Assume first that X is a suffix code. Then, by the dual of Lemma 4.2.3, for 
any pair (u, v) G C{X, F), the function ipu^v from Fact(u) into itself is injective. 
Since Fact(M) is a finite set, pu,v is also surjective for any pair (m, v) G C{X, F). 



This implies by the dual of Lemma 4.2.4 that X is left F-complete. 



Assume conversely that X is left F-complete. By the dual of Lemma 4.2.4 



the function pu,v maps Fact(M) onto itself for every pair {u, v) G C{X , F). This 
implies as above that it is also injective. By the dual of Lemma [4.2.3 , and since 
X is an F-maximal prefix code, X is a suffix code. ■ 



Proposition 4.2.5 has a dual formulation for an F-maximal suffix code. 



Proof of Theorem 4.2.2. We first show that (i) implies (ii). li X is an F-maximal 



suffix code, then X is left F-complete and thus condition (ii) is true. Assume 



853 next that X is an F-maximal prefix code. Since X is suffix, by Proposition 4.2.5 

854 it is left F-complete and thus (ii) holds. Finally assume that X is neither an 

855 F-maximal prefix code nor an F-maximal suffix code. Let y, z G F be such that 

856 X U y is prefix and X (J z is suffix. Since F is recurrent, there is a word u such 

857 that yuz G F. Then X U yuz is bifix and thus we get a contradiction. 

858 The proof that (i) implies (ii') is similar. 

859 (ii) implies (iii). Consider the set Y = X \ A^X. It is a suffix code by defi- 

860 nition. It is prefix since it is contained in X. It is left F-complete. Indeed, one 
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has A*X = A*Y and thus A*Y is left F-dense by the dual of Proposition 3.3.2 



Hence Y is an i^-maximal suffix code. By the dual of Proposition 4.2.5, the set 



Y is right i^-complete. Thus Y is an i^-maximal prefix code. This implies that 
X = Y and thus that X is an _F-maximal prefix code and an i^-maximal suffix 
code. 

The proof that (ii') implies (iii) is similar. It is clear that (iii) implies (i). 
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Example 4.2.6 Let A — {a, b} and let F be the set of words without factor bb 
(Example |T|). The set X = { aaa, aaba, ab, baa, baba} is a finite .F-maximal 



bifix code. As an example of computation of the relation ipu,v, note that for 

871 u — aaa and v = b, we have Fact(u) = {tti, 7r2, 773, 7r4} with tti — {l,aaa), 

872 TT2 = (a, aa), = (oa, a), 1:4 — (aaa, I). The function ipu,v is the cycle (7ri7r47r3) 
and fixes 7r2. 



The following example shows that Theorem 4.2.2 is false if F is not recurrent. 



Example 4.2.7 Let F = a*b*. Then X = {aa,ab,b} is an F-maximal prefix 
code. It is not a suffix code but it is left F-complete as it can be easily verified. 



877 Let F C ^* be a factorial set. The F-degree, denoted dpiX), of a set 

878 X C A* is the maximal number of parses of words of F with respect to X, that 

879 is 

dpiX) = max Sx{w) . 

880 The F-degree of a set X is finite or infinite. The y4*-degree is called the degree, 

881 and is denoted d{X). Observe that dp{X) = dF{X O F), and that dpiX) < 

882 d{X). 

883 The following is a generalization of Theorem 6.3.1 in 

884 Theorem 4.2.8 Let F be a recurrent set and let X d F be a bifix code. Then 

885 X is an F-thin and F -maximal bifix code if and only if its F-degree dp{X) is 

886 finite. In this case, 

I{X) = {weF\5x{w)<dF{X)}. (4.11) 

887 Proof. Assume first that X is an F-thin and F-maximal bifix code. Since X is 

888 F-thin, F\I{X) is not empty. Let u S F\I{X) and w G F. Since F is recurrent. 



there is a word v & F such that uvw € F. Since X is prefix, by Proposition 4.1.3, 
the number of parses of u is equal to the number of prefixes of u which have no 
suffix in X. Since X is left F-complete, the set of words in F which have no 
suffix in X coincides with the set S of words which are proper suffixes of words 
in X. Since u is not an internal factor of a word in X, any prefix of uvw which 
is in S is a prefix of u. Thus Sx{uvw) = {S_A* ,uvw) = iS_A*,u) — dxiu). 
Since by Equation (|4.8|), Sxiw) < 6x{uvw), we get 6xiw) < Sx{u). This shows 
that Sx is bounded, and thus that the F-degree of X is finite. Moreover, this 
shows that F \ I{X) is contained in the set of words of F with maximal value 
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of 8x- Conversely, consider w G Then there exists w' £ X and p, s S 

such that w' = pws. Then by Equation (4.9) 5x{w') > Sxiw), and thus 6x{w) 
is not maximal in F. This proves Equation (4.11). 

Conversely, let w € -F be a word with 6x{w) = dpiX). For any nonempty 
word u € F such that uw € F we have uw € XA*. Indeed, set u = au' with 
a e A and u' e F. Then i5x(aw'w) > Sxiu'w) > 5x{w) by Equation ( |4^ ). This 
implies SxicLu'w) = Sxiu'w) = Sx{w). By the dual of Equation (4.7) we obtain 
that uw G XA*. 

This implies first that X is F-thin and next that XA* is right F-dense. 
Indeed suppose that w is an internal factor of a word in X. Let p, s G F \ 1 
be such that pws G X. Since pw G F, the previous argument shows that 
pw G XA*, a contradiction. Thus w G F \ /(X). This shows that X is F-thin. 

Next, and since F is recurrent, for any w G F, there is a word u G F such 
that wuw G F. Then wttw G XA* by using again the above argument. Thus 



912 XA* is right F-dense and X is an F-maximal bifix code by Theorem 4.2.2 



Example 4.2.9 Let F be the Fibonacci set. The set X — {a, bab, baab} is a 
finite bifix code. Since it is finite, it is F-thin. It is an F-maximal prefix code 
as one may check on Figure 2.1. Thus it is, by Theorem 4.2.2| , an F-thin and 
F-maximal bifix code. The parses of the word bab are (l,6a6, 1) and (6, a, &). 
Since bab is not in I{X), one has dpiX) = 2. 



918 Example 4.2.10 Let F be the Fibonacci set. The set X = {aaba, ab, baa, baba} 

919 is a bifix code. It is F-maximal since it is right F-complete (see Figure |2.lD . It 

920 has F-degree 3. Indeed, the word aaba has three parses (l,aa6a, 1), {a,ab,a) 

921 and {aa, 1, 6a) and it is in F \ I{X). 

922 The following result establishes the link between maximal bifix codes and F- 

923 maximal ones. 



Theorem 4.2.11 Let F be a recurrent set. For any thin maximal bifix code 
X C of degree d, the set Y = X n F is an F-thin and F-maximal bifix code. 
One has dplY) < d with equality when X is finite. 

Proof Recall that dpiY) = dpiX n F) = dpiX) < d. Thus dpiY) is finite and 



by Theorem 4.2.S , F is an F-thin and F-maximal bifix code. If X is finite, then 



928 

929 each word which is in F and is longer than the longest words in X has d parses. 

930 Thus dpiX) = d, whence dpiY) = d. m 



Example 4.2.12 The set X = aU ba*b is a maximal bifix code of degree 2. 



Let F be the Fibonacci set. Then X D F — {a, baab, bab} (see Figure 2.1 



As another example, let Z = {a^, a^ba, a^b^, ab, ba^, baba, bab^, b^a, b^}. The 
set Z is a finite maximal bifix code of degree 3 (see [^). Then Z F = 
{a'^ba, ab, ba^, baba} (see Figure pTl). 
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936 Example 4.2.13 Let F be the Thue-Morse set. Consider again X = aUba*b. 

937 Then X O F — {a, baab, bab, bb} is a finite i^-maximal bifix code of F-degree 2 



938 
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(see Figure 2.2). 



The following examples show that a strict inequality can hold in Theo- 



rem 4.2.11 . The second example shows that this may happen even if all letters 



941 occur in the words of F. 

942 Example 4.2.14 Let A = {a, b} and let X = aUba*b. The set X is a maximal 

943 bifix code of degree 2. Let F = a*. Then F is a recurrent set. We have 

944 y = X nF = a. The F-degree of r is 1. 

945 Example 4.2.15 Let A — {a,b} and let X C be the maximal bifix code 

946 of degree 3 with kernel K — {aa,ab,ba}. Let F be the Fibonacci set. Since 

947 K — A'^ n F, K is an F-maximal bifix code. Since K d X (1 F and K is F- 

948 maximal, one has X nF = K. Next K = n F and Theorem |4.2.11| imply 

949 that dpiK) = 2. Thus d{X) = 3 and dpiX n F) = 2. 

950 4.3 Derivation 

951 We first show that the notion of derived code can be extended to F-maximal 

952 bifix codes. The following result generalizes Proposition 6.4.4 in 

953 The kernel of a set of words X is the set of words in X which arc internal 

954 factors of words in X. We denote by K{X) the kernel of X. Note that K{X) = 

955 i{x)nx. 

956 Theorem 4.3.1 Let F be a recurrent set. Let X C F be a bifix code of finite 

957 F -degree d > 2. Set I = I{X) and K = K{X). Let G = {lAn F)\I and 

958 D = (AI nF)\I. Then the set X' = K U (G n D) is a bifix code of F-degree 

959 d — 1 . 



The code X' is called the derived code of X with respect to F or F-derived 
code. 

The proof uses two lemmas. Let P be the set of proper prefixes of X and 
let S be the set of proper suffixes of X. 

Lemma 4.3.2 One has G C S and D C P. 



Proof. By Theorem tl.2.§| , the parse enumerator of X is bounded on F and 
F \ I{X) — F \ I is the set of words in F with maximal value dp{X). Let 
y = ha he in G with h £ I and a € A. Since y ^ I, we have Sx{ha) > Sx{h). 



968 Thus, by Proposition 4.1.6 , y = ha does not have a suffix in X. Since A*X is 

969 left F-dense, this implies that y is a proper suffix of a word in X. Thus y is in 

970 S. The proof that D C P is symmetrical. ■ 

971 Lemma 4.3.3 For any x £ X \ K , the shortest prefix of x which is not in I is 

972 in X' . 
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Proof. Since x ^ K, we have x ^ I. Let x' be the shortest prefix of x which is 
not in / or, equivalently such that 5x{x') = dpiX). Let us show that x' S X' . 
First, x' is a proper prefix of x. Set indeed x = pa with p £ A* and a e A. 
Since x £ X, we have by Equation (4.7), Sx{x) — Sxip)- Thus p ^ I and x' is 
a prefix of p. 

Since 1 e /, we have x' 7^ 1. Set x' = p'a' with p' e A* and a' e A. By 
definition of x' we have p' € /. Thus x' e G = {I A n F)\I. 

Next, set x' — a" s with a" € A and s £ A* . Since x' ^ XA*, we have by 
the dual of Equation (4/7), 5x{s) < 6x{x'). Thus s is in /. This shows that 



e D. Thus we conclude that x' e G (1 D C X' . 



There is a dual of Lemma 4.3.3 concerning the shortest suffix of a word in 
X\K. 



Proof of Theorem 4.3.1 



We first prove that X' is a prefix code. Suppose first that /c e X is a prefix 
of a word z in GnD. By Lemma [4.3. 2| , a word in _D is a proper prefix of X. 
Thus fc e X would be a proper prefix of X, which is impossible since X is prefix. 

Suppose next that a word u of G H Z? is a prefix of a word k in K. Since k 
is in /, it follows that u is in /, a contradiction. 

Finally, no word y £ GnD can be a proper prefix of another word y' in GnD, 



otherwise y' — yz, with z G A+. Therefore, since G C 5 by Lemma 4.3.2, there 
is t S such that ty' = tyz S X. Consequently, y S G n /, a contradiction. 

Thus X' is a prefix code. To show that it is _F-maximal, it is enough to show 
that any word in X has a prefix in X' . 

Consider indeed x G X. If a; is in if then x G X' . Otherwise, let x' be the 



shortest prefix of x which is not in /. By Lemma 4.3.3, we have x' € X' . 
Thus X' is an i<"-maximal prefix code. 

A symmetric argument shows that X' is an _F-maximal suffix code. 

Let us showthat dF(X') = '^f(-^)-1- We first note that GnD 7^ 0. Indeed, 
let x G X be such that Sx{x) is m axim al on X. If a; were an internal factor 
of a word y d X , then by Equation (LS) Sxix) < Sx{y) which contradicts our 
assumption. Thus x ^ K. This shows that K is not an i^-maximal bifix code 
and thus that X'\K = Gr]D ^9. Consider x' eGOD. Since (G n £>) n I{X) 
is empty, and since I{X') C I{X), x' cannot be in I{X'). Thus the number of 
parses of x' with respect to X' is dpiX'). 

Let P' be the set of proper prefixes of X' . Wc show that x' has dpiX) — 1 
suffixes which are in P' . This will show that dpiX') ^ dp{X) — 1 by the dual 
of Proposition 4.1.3| . 

Since x' G F\I, we have dx{x') — dF{X). Thus x' has dF{X) suffixes in P. 
One of them is x' itself since x' € D G P. Let p be a proper suffix of x' which 
is in P. Let us show that p does not have a prefix in X' . Indeed, arguing by 
contradiction, assume that x" G X' is a. prefix of p. We cannot have x" G K 
since p is a proper prefix of a word in X. We cannot have either x" € G D D. 
Indeed, since x' is in AI, p is in / and thus also x" G /. Thus p cannot have 
a prefix in X' . Since X' is an i^- maximal prefix code, this implies that p is a 
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proper prefix of X' . Thus, the dF{X) — 1 proper suffixes of x' which are in P 
are in P'. ■ 

Example 4.3.4 Let F be the Fibonaeci set. Let X — {a,bab,baab}. The se t 
X is an F-thin and F-maximal bifix code of F-degree 2 (see Example 4.2.9| ). 



We have K = {a}, I = {l,a,aa}, G = {b,ab,aab} and D = {b,ba,baa}. Thus 
X' = {a,b}. 

The following is a generalization of Proposition 6.3.14 in 

Proposition 4.3.5 Let F be a recurrent set. Let X C F be a bifix code of 
F -degree d > 2. Let S be the set of proper suffixes of X and set I = I{X). 
The set S \ I is an F -maximal prefix code and the set S H I is the set of proper 
suffixes of the derived code X' . 

The proof uses the following lemma. 

Lemma 4.3.6 Let F be a recurrent set. Let X C F be an F-thin and F- 
maximal bifix code. Let S be the set of proper suffixes of X and set I = I{X). 
For any w €z F \ I the longest prefix of w which is in S is not in I . 

Proof. Let s be the longest prefix of w which is in S. Set w — st. Let us 
show that for any prefix t' of t, we have Sx{st') = Sx{s). It is true for t' = 1. 
Assume that it is true for t' and let a € A be the letter such that t'a is a prefix 
of t. Since st'a ^ 5", we have st'a G A*X. Thus by Equation (fl.7[), this implies 
6x{st'a) = Sx{st'). Thus 6x{st'a) = Sx{s). We conclude that Sx{st) — Sx{s). 
Since w — st is in F \ I , and since F \ I is the set of words in F with maximal 
value of 5x, this implies that s G F \ /. ■ 

This lemma has a dual statement for the longest suffix of a word in w €z F \ I 
which is in P. 



Proof of Proposition 4.3.5 . Set Y — S \ I . Let us first show that Y is prefix. 
Assume that u,uv G Y. Since uv Cz S there is a nonempty word p such that 
puv G X. Since u ^ I, this forces u = 1. Thus Y is prefix. 

We show next that YA* is right F-dense. Consider u € F and let w G F \ J. 
Since F is recurrent, there exists w G F such that uvw G F. Let s be the longest 



word of S which is a prefix of uvw. By Lemma 4.3.6, we have s £ F \ I. Thus 
s <E S\I ^Y and uvw G YA*. This shows that YA* is right F-dense. 

Let us now show that the set S' of proper suffixes of the words of X' is 
Sol. Let s be a proper suffix of a word x' G X'. If x' G K, then s is in S* n /. 
Suppose next that x' G G D D. Since G C S" by Lemma we have s € S. 



Furthermore, since D C AI, we have s G /. This shows that s G 5 n /. 

Conversely, let s be in 5 n /. Let x ^ X he such that s is a proper suffix 
of X. If X is in K then x is in X' and thus s is in S". Otherwise, let y be the 



shortest suffix of x which is in not in /. By the dual of Lemma 4.3.3, the word 
y is in X' . Then s is a proper suffix of y (since s G / and y ^ I) and therefore 
s is in 5'. ■ 



30 



There is a dual version of Proposition 4.3.5 concerning the set of proper 



prefixes of an .F-thin and F-maximal bifix code X G F. 
The following property generalizes Theorem 6.3.15 in 



1060 Theorem 4.3.7 Let F be a recurrent set. Let X be a bifix code of finite F- 

1061 degree d. The set of its nonempty proper suffixes is a disjoint union of d — 1 

1062 F-maximal prefix codes. 

1063 Proof. Let S be the set of pro per suffixes oi X. li d = 1, then S* \ 1 is empty. 

1064 If d > 2, by Proposition 4.3.5 , the set F = 5* \ / is an F-maximal prefix code 

1065 and the set S' n / is equal to the set S' of proper suffixes of the words of the 

1066 derived code X' . Arguing by induction, the set S" \ 1 is a disjoint union of d — 2 

1067 F-maximal prefix codes. Thus S* \ 1 = y U (S" \ 1) is a disjoint union of d — 1 

1068 F-maximal prefix codes. ■ 

1069 The following generalizes Corollary 6.3.16 in with two restrictions. First, 

1070 it applies only in the case of finite maximal bifix codes instead of thin bifix codes 

1071 (in order to be able to use Proposition 3.3.4 ). Next, it applies only for recurrent 

1072 sets such that there exists a positive invariant probability distribution (in order 

1073 to be able to use Proposition 3.4.1). 



Corollary 4.3.8 Let F be a recurrent set such that there exists a positive in- 
variant probability distribution tt on F. Let X be a finite bifix code of finite 
F-degree d. The average length of X with respect to tt is equal to d. 

Proof. Let tt b e a po sitive invariant probability distribution on F. By the dual 
of Proposition 3.4.1 , one has X{X) = tt{S). In view of Theorem 4.3.7 , we have 
S \1 — Yi U . . . U Yd-i where each is a finite F-maximal prefix code. By 



Proposition |3.3.4 we have n(Yi) = 1 for 1 < i < d - 1. Thus \{X) = d. 



1081 Ex ampl e 4.3.9 Let F be the Fibonacci set and let X — {a, bah, baab} (Exam- 

1082 pie 4.3.4). The set X is an F-maximal bifix code of F-degree 2. With respect 

1083 to the unique invariant probability distribution of F (Example 2.4.6), we have 

1084 X{X) = A + 3(2 - 3A) + 4(2A - 1) = 2. 

1085 Now we show that an F-thin and F-maximal bifix code is determined by its 

1086 F-degree and its kernel. We first prove the following generalization of Proposi- 

1087 tion 6.4.1 from M. 



1088 Proposition 4.3.10 Let F be a recurrent set. Let X C F 

1089 of finite F-degree d and let K be the kernel of X . Let Y be 

1090 K (lY <ZX. Then for all w G L{X) U Y, 



For all w £ F , 



5y{w) — Sx{w) 



Sxiw) = minjcZ, (5y(w)}. 



be a bifix code 
a set such that 



(4.12) 



(4.13) 
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1092 
1093 
1094 
1095 
1096 
1097 
1098 
1099 
1100 
1101 
1102 

1103 
1104 

1105 
1106 



Proof. 



1108 
1109 
1110 



Denote by F{w) the set of factors of the word w. Notice that Equa- 
A^{1-X)A^. Thus, to prove ( ^1^ ), we have 
F{w)r)Y. The inclusion 



tion (4.3) is equivalent to Lx 

to show that for any w G I{X)UY one has F{w)r]X 
F{w) n F C F{w) n X is clear. Conversely, if w is in /(X), then F{w) r\X (Z K 
and thus F{w) H X C F{w) O Y . Next, assume that w is in Y. The words in 
F{'w){^X other than w are all in K. Thus we have again F{w){^X c F{w){^Y . 

(4.13), assume first that w G I{X). Th en S xj'w) < 
Moreover, 5x (w) = 5y{w) by Equation Thus 
Equation ( 4.15 ) holds. Next, suppose that w E F \ I{X ). Then 6x{ w) = d. 



To show Equation 
d by Theorem 



4.2.8 



Since Y C X, we have Sxiw) < Syiw) by Equation (4J_). This proves ( 4.13 ) 



Proposition 4.3.10 will be used to prove the following generalization of The- 
orem 6.4.2 in . 

Theorem 4.3.11 Let F be a recurrent set and let X C F be a bifix code of 
finite F-degree d. For any w G F, one has 



5x{w) =mm{d,SK(x){w)} ■ 
1107 In particular X is determined by its F-degree and its kernel. 



Proof. Take Y — K{X) in Proposition 4.3.10. Then the formula follows from 
Equation (4.13). Next X is determined by Lx^ and so by Sx, through Equa- 
tion (O). ■ 



We now state the following generalization of Theorem 6.4.3 in [||. 

Theorem 4.3.12 Let F be a recurrent set. A bifix code Y <Z F is the kernel of 
some bifix code of finite F-degree d if and only if 

(i) Y is not an F -maximal bifix code, 

(ii) max{^y(y) \ y£Y}<d-l. 

Proof. Let X be an F-thin and F-maximal bifix code of F-degree d and let 
Y = K{X) be its kernel. Condition (i) is satisfied because X = Y implies that 
X is equal to its derived code which has F-degree d — \. Mor eove r, for every 
y g y one has 5x{v) < d—1. Since Sx{y) — Syiy) by Equation ( 4.12 ), condition 
(ii) is also satisfied. 

Conversely, let y C F be a bifix code satisfying conditions (i) and (ii). Let 
(5 : j4* N be the function defined by 

S{w) — min{d, Sy{w)}- 

It can be verified that the function 6 satisfies the four conditions of Proposi- 



tion 4.1.5. Thus S is the parse enumerator of a bifix code Z. Let X — Z (1 F. 
Then 6x and 6 have the same restriction to F. Since 6 is bounded on F, the 
same holds for 6x ■ This implies that the code X is an F-thin and F- maximal bi- 



fix code by Theorem 4.2.8, Since the code Y is not an F-maximal bifix code, the 
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F-parse enumerator Sy is not bounded. Consequently max{S{w) \ w £ F} — d, 
showing that X has F-degree d. Let us prove finally the Y is the kernel of X. 
Since, by condition (ii), niax{(5y(?/) \ y £ Y} < d — 1, we have Y C I{X). 

Moreover, for w G I{X) we have 5x{w) — (5y(w). Let L (resp. Ly) be the 
indicator of X (resp. of Y). Since I - X = {I - A)L{1 - A) and 1 - F = 
(1 — A)Ly{l — ii) by Equation (4.3), we conclude that for w G I{X), we have 
(X . w) — (!!,?«). This implies that if uj G I{X), then w is in X if and only if 
w is in Y. Thus K{X) = I{X) C^ X ^ I{X) n F = F and F is the kernel of 
X. ■ 






Figure 4.4: The three F-maximal bifix codes of F-degree 2 in the Fibonacci 
set F. 



Example 4.3.13 Let A — {a, 6} and let F C A* be the Fibonacci set. There 



1140 



are three maximal bifix co des of F-degree 2 in F represented on Figure 4.4 



Indeed, by Theorem 4.3.12 , the possible kernels are 0, {a} and {6}. 



4.4 Finite maximal bifix codes 

1141 The following generalizes Theorem 6.5.2 of |^. 

1142 Theorem 4.4.1 For any recurrent set F and any integer d>l there is a finite 

1143 number of finite F-maximal bifix codes X <Z F of F-degree d. 

1144 Proof. The only F-maximal bifix code of F-degree 1 is F n A. Arguing by 

1145 induction on d, assume that there are only finitely many finite F-maximal bifix 

1146 codes of F-degree d. Each finite F-maximal bifix code X C F of F-degree d-\-\ 

1147 is determined by its kernel which is a subset of its derived code X' . Since X' is 

1148 a finite F-maximal bifix code of F-degree d, there are only a finite number of 

1149 kernels and we are done. ■ 

1150 Example 4.4.2 Let A — {a, 6} and let F be the set of words without factor 

1151 hh. There are two finite F-maximal bifix codes of F-degree 2, namely the code 

1152 {aa, afe, 6a} with empty kernel and the code {aa, aha, 6} with kernel h. The code 

1153 of F-degree 2 with kernel a is a U ba^b, and thus is infinite. 

1154 The following result shows that the case of a uniformly recurrent set contrasts 

1155 with the case F — A* since in A*, as soon as Card(A) > 2, there exist infinite 

1156 maximal bifix codes of degree 2 and thus of all degrees d > 2; see e.g. 

1157 Example 6.4.7] for degree 2 and Theorem 6.4.6] for the general case. 
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1158 Theorem 4.4.3 Let F be a uniformly recurrent set. Any F-thin bifix code 

1159 X d F is finite. Any finite bifix code is contained in a finite F -maximal bifix 

1160 code. 

1161 Proof. Let X C F he an i^-thin bifix code. Since X is F-thin, there exists a 

1162 word w €z F \ I{X). Since F is uniformly recurrent there is an integer r such 

1163 that w is factor of every word in Fr ~ F O A^ . Assume Fk O X ^ (d for some 

1164 fc > r + 2, and let x G Fk (1 X . Set x — pqs, with q & Fr Ci I{X), and p, 

1165 s nonempty. Then w is factor of q, hence w is in I{X), a contradiction. We 
deduce that each x in X has length at most r + 1. Thus X is finite. 

Let X C F he a finite bifix code which is not F-maximal. Let d = 



1166 
1167 
1168 
1169 
1170 



max{Sx{x) \ x £ X}. By Theorem 4.3.12, X is the kernel of an F-thin and 



F- maximal bifix code Z C F of F-degree d -I- 1. By the previous argument, Z 
is finite. ■ 

By Theorem 6.6.1 of any rational bifix code is contained in a maximal 
rational bifix code. We have seen that the situation is simpler for bifix codes in 
uniformly recurrent sets. 

Example 4.4.4 Let F be the Fibonacci set. Let X — {a,bab}. Then X is 



contained in the bifix code {a, bob, baab} which has F-degree 2 (see Figure 4.4) 



It is also the kernel of {a, baabaab, baababaab, bab} which is a bifix code of F- 



degree 3 (see Table 5.1) 



1178 The following is a generalization of Proposition 6.2.10 in The equality 

1179 d{Y) = d{X) is stated as a comment following Proposition 6.3.9 in page 243], 

1180 in a more general framework. 

1181 Proposition 4.4.5 Let F be a recurrent set, let X C F be a finite F -maximal 

1182 bifix code and let w be a nonempty word in F. Let G — Xw^^ , and D — w~^X . 

1183 If 

G 7^ , D^%, and GwDwD^?), 

1184 then the set 

Y ^{XUwU{GwDnF))\{GwUwD) (4.14) 

1185 is a finite F -maximal bifix code with the same F-degree as X . 

1186 We use in the proof the following proposition which is an extension of Corol- 

1187 lary 3.4.7 of ||. 

1188 Proposition 4.4.6 Let F be a recurrent set and let X G F be a F-maximal 

1189 prefix code. Let X — Xi U X2 be a partition of X into two prefix codes and let 

1190 Y be a finite prefix code such that Y H x^^F is x^^ F-maximal for all x G X2. 

1191 Then the set Z = Xi U {X2Y D F) is an F-maximal prefix code. 
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1192 
1193 
1194 
1195 
1196 
1197 
1198 
1199 
1200 
1201 
1202 
1203 
1204 
1205 
1206 
1207 
1208 
1209 
1210 



Proof. We first prove that Z is a prefix code. Let z and z' be distinct words 
in Z. We show that they are not prefix-comparable. Since Xi is a prefix code, 
this holds if both words are in Xi. Assume next that z E X2Y. Then z = xy 
with X G X2 and y £ Y. 

If z' is in Xi, then z' and x are not prefix-comparable because they are 
distinct since z' e Xi and x € X2, and so z and z' are not prefix-comparable. 

If z' e X2Y, set z' = x'y' with x' € X2 and y' G F. Either x and a;' are not 
prefix-comparable, and then so are z and z', or x — x' . In the latter case, y and 
y' are not prefix-comparable because Y is a prefix code, and again z and z' are 
not prefix-comparable. Thus Z is a prefix code. 

Let us show that Z is i<"-maximal. Let u E F . Since X is an i^-maximal 
prefix code, there is an a; G X which is prefix-comparable with u. If x is in Xi, 
then X E Z and thus u is prefix-comparable with a word of Z . Otherwise, we 
have X G X2. 

Suppose first that m is a prefix of x. Since y is a finite a; "^.F- maximal prefix 
code, it is not empty and m is a prefix of xv for every v eY C\ x~^F . 

Suppose next that u = xv for some word v. Since v is in x~^F and since 
Fnx^^i^ is an maximal prefix code, the word v is prefix-comparable with 

some y E Y n x^^F. Thus u is prefix-comparable with xy E Z. ■ 



1211 Proof of Proposition 4.4. £ . The condition G $ { resp. D 7^ 0) means that w 

1212 is a suffix of X (resp. a prefix of X). The condition Gw D wD ~ implies that 

1213 w is not in X. 



By Th eorem [4.2. 2| , the set X is an i^-maximal prefix code. By Proposi- 
tion B.3.S, the set Yi = {X U w) \ wD is an F-maximal prefix code. Next, we 
have 

Y = {Yi\Gw) U (GwDnF). 



Wc show that Y is an _F-maximal prefix code, by applying Proposition 4.4.6 
Indeed, consider the partition Yi = XiU X2 with Xi — Yi \ Gw and X2 = Gw. 
Then Y — XiU (X2D n F). Clearly D is a finite w^^F-maximal prefix code. 
Since {gw)~^F is a subset of w^^F for all g E G, the set D n {gw) ~^F is a finite 



(51/7) maximal prefix co de for all 5 G G by Proposition 3.3.(: , So the claim 
> from by Proposition 4.4.6 . This proves that Y is an i^-niaximal prefix 
Since Y it is also a suffix code, it follows that Y is an _F-maximal bifix 



1224 code by Theorem 4.2.2 




Figure 4.5: Construction of (p{p) (second case) 



1225 To show that X and Y have the same degree, consider a word u E F which is 

1226 not an internal factor of X nor Y. Such a word exists since X and Y are finite. 

1227 Let P (resp. Q) be the set of proper prefixes of the words of X (resp. Y). We 

1228 define a bijection ip between the set P{u) of suffixes of u which are in P and the 
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set Q{u) of suffixes of u which are in Q. This wih imply that dp{X) = dpiY) 



by the dual of Proposition 4.1.3 



Let p e P be a suffix of u and set u = rp. If w is not a prefix of p, then p is in 
Q. Otherwise, set p = ws. Since the words in P starting with w are all prefixes 
of wD, the word s is a proper prefix of D. Since G is an i^i/;~^-maxinial suffix 
code, r is suffix-comparable with a word of G. If r is a proper suffix of G, then 
urws is an internal factor of GwD, a contradiction. Thus r has a suffix g £ G. 
This suffix is unique because G is a suffix code. Since gp — gws, the word gp is 
a proper prefix of GwD, and thus a proper prefix of Y. Thus gp £ Q{u). We 
set (see Figure 4.5) 



\ p if p ^ wA* , 



9P 



if p e wA* and g € G is the suffix of r in G. 



Thus ip maps into Q{u). We show that it is injective. Suppose that 

(p{p) — (p{p') for some p^p' G -P(w). Assume that (p{p) = gp and ip{p') = g'p' 
with g, g' G G. Since p and p' start with w, the word gp = g'p' starts with the 
words gw and g'w which are in X. This shows that g = g' and thus p — p' . 
Assume next that p = g'p' with g' £ G and p' G wA* . But then g'w is a prefix 
of with p in P and (/'-u; in X, a contradiction. 




Figure 4.6: Reconstruction of the factorization. 

To show that if is surjective, consider q e Q{u). Assume first that q has a 
prefix X in X (see Figure |4!^ ) . By Equation ( [4.14D , one has x = gw and (7 — gws 
for some g € G and s a proper prefix of the word d in D. Thus ws is a proper 
prefix of wD C X, and consequently ws is a proper prefix of X. Since ws is a 
suffix of g, it is a suffix of u. Thus S P{u). Set it = rws. Then g is a suffix 
of r. Moreover ws G wA* . Consequently ip{ws) = q. 

Finally, if q has no prefix in X, then g is a proper prefix of X. Moreover, 
since g is a prefix of Y, either g is a proper prefix of w or g is not a prefix of 
wD. In both cases, w is not a prefix of q and therefore ip{q) = q. Thus tp is 
surjective. ■ 



The set Y defined by Equation ( 4.14 ), is said to be obtained from X by 
internal transformation (with respect to w). 

Example 4.4.7 Let F be the Fibonacci set. The set X = {aa, ab,ba} is an 
P-maximal bifix code of P-degree 2. Then Y = {aa, aba, b} is a bifix code of 
P-degree 2 which is obtained from X by internal transformation with respect to 
w = b. Indeed, here G = D = {a}, Gw — {ab}, wD = {ba} and GwD = {aba}. 
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1264 



1274 



1276 



The following theorem is due to Cesari. It is Theorem 6.5.4 in 

Theorem 4.4.8 For any finite maximal hifix code X over A of degree d, there 
is a sequence of internal transformations which, starting from the code A'^, gives 
the code X. 



Theorem 4.4.8 has been generalized to finite i^-maximal bifix codes when F 
is the set of paths in a strongly connected graph (see jl^ ) . It is not true in any 
recurrent, or even uniformly recurrent set, as shown by the following example. 

Example 4.4.9 Let F be the Fibonacci set. The set X — {a, bab, baab} is a 
finite bifix code of F-degree 2. It cannot be obtained by a sequence of internal 
transformations from the code f] F = {aa, ab, ba}. Indeed, the only internal 
transformation which can be realized is with respect to w = b. The result is 



{aa, aba, b} by Example 4.4.7. Next, no internal transformation can be realized 



from this code. See also Figure 4.4 



A more general form of internal transformation is described in |^ in Propo- 
sition 6.2.8. We do not know whether its adaptation to finite _F-maximal bifix 
codes allows one to obtain all finite F-maximal bifix codes of F-degree d starting 
with the code A''- n F. 



5 Bifix codes in Sturmian sets 



In this section, we study bifix codes in Sturmian sets. This time, the situation 
is completely specific. First of all, as we have already seen, any F-thin bifix 
code included in a uniformly recurrent set F is finite (Theorem 4.4.3| ). Next, in 
a Sturmian set F, a ny bifi x code of finite F-degree d on k letters has {k— l)ci-|- 1 
elements (Theorem 5.2.1 ). Since A"^ is a bifix code of degree d, this generalizes 
the fact that Card(F n A'^) = (fc - l)d + 1 for aU d>l. 

Additionally, if an infinite word x is X-stable, that is if , for some thin 
maximal bifix code X, one has dp(^y^{X) = dp(^x)iX) for all suffixes y of x, then 
the inequa lity C ard(X n F{x)) < dF(x){X) implies that x is ultimately periodic 
(Theorem ^.3.2|). 



5.1 Sturmian sets 

Let F be a factorial set on the alphabet A. Recall that a word w is strict 
right-special if wA C F. It is strict left-special if Aw C F. A suffix of a (strict) 
right-special word is (strict) right-special, a prefix of a (strict) left-special word 
is (strict) left-special. 

A set of words F is called Sturmian if it is the set of factors of a strict 



episturmian word. By Proposition 2.3.3 a Sturmian set F is uniformly recurrent. 
Moreover, every right-special (left-special) word in F is strict. 

The following statement gives a direct definition of Sturmian sets. 
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1298 Proposition 5.1.1 A set F is Sturmian if and only if it is uniformly recurrent 

1299 and 

1300 (i) it is closed under reversal, 

1301 (ii) for each n, there is exactly one right-special word in F of length n, and 

1302 this right-special word is strict. 

1303 Proof, li F = F{x) for some strict episturmian word, then the conclusions of 

1304 the proposition hold. 

1305 Conversely, assume that F has the required properties. For each n, the 

1306 reversal of the strict right-special word of length n is a strict left-special word. 

1307 Since all these left-special words are prefixes one of the other, there is an infinite 

1308 word X that such that all its prefixes are these strict left-special words. Clearly, 

1309 F{x) C F. To show that x is strict episturmian, we verify that F{x) is closed 

1310 under reversal. Let u € F{x). Then u d F. Since F is uniformly recurrent, 

1311 there is an integer m such that u is a factor of the right-special word w of 

1312 length m. Consequently the reversal u of u is a factor of the left-special word 

1313 w of length to, and therefore is in F{x). 

1314 To prove that F C F{x), let u Cz F. Since F is uniformly recurrent, there 

1315 is an integer to such that u is a factor of the left-special word w of length to. 

1316 Since w is a prefix of x, this shows that u S Fix). ■ 

1317 The following statement is a direct consequence of the previous proof. 



Proposition 5.1.2 Let F be a Sturmian set of words. There is a unique strict 
standard episturmian infinite word s such that F — F{s). 



As a consequence of Proposition 5.1.2, for every left-special word w of a 



Sturmian set F, exactly one of the words wa, for a G ^, is left-special in F. 
Symmetrically, for every right-special word w in F, exactly one of the words aw 
for a S ^ is right-special in F. More generally, for every n > 1 there is exactly 
one word u of length n such that uw is a right-special word in F. 



1325 Proposition 5.1.3 Any word in a Sturmian set F is a prefix of some right- 

1326 special word in F. 

1327 Proof. Let indeed u € F. Since F is uniformly recurrent, there is an integer n 

1328 such that u is a factor of any word in F of length n. Let w be the right-special 

1329 word of length n. Then u is a factor of w, thus w = pus for some words p, s. 

1330 Since w is right-special, its suffix us is also right-special. Thus m is a prefix of 

1331 a right-special word. ■ 



The following example shows that for a Sturmian set F, there exists bi- 
fix codes X d F which are not i^-thin (we have seen such an example for a 



uniformly recurrent but not Sturmian set in Example 4.2.1 



Example 5.1.4 Let be a Sturmian set. Consider the following sequence 
(a;n)n>i of words of F. Set xi ~ a, for some a Cz A. 
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Suppose inductively that xi, . . . ,x„ have been defined in such a way that 
Xn — {xi,X2, ■ ■ ■ ,Xn} IS bifix and not .F-maximal bifix. Define Xn+i as follows. 
By Theorem 4.2.2, X„ is not right F-complete, thus there is a word it in which 



is incomparable for the prefix order with the words of X„. By Proposition 5.1.3 



the word u is a prefix of a right special word v in F. Symmetrically, since X„ 
is not an i^-maximal bifix code, there is a word w £ F which is incomparable 
with the words of X„ for the suffix order. Since F is recurrent, there is a word 
t such that vatw G F. Then we choose Xn+i = vatw. 

The set Xn+i ~ Xn U Xn+i is a bifix code since Xn+i is incomparable with 
the words of Xn for the prefix and for the suffix order. It is not an i^-maximal 
prefix code since vb, for all letters 6 ^ a, is incomparable for the prefix order 
with the words of Xn+i'- indeed, its prefix u is incomparable for the prefix order 
with all words in Xn and vb is incomparable with Xn+i- Since it is finite, it is not 
an _F-maximal bifix code by Theorem 4.2.2. The infinite set X = {xi,X2, ■ ■ ■} 



is a bifix code included in F and it is not F-thin by Theorem 4.4.3 



1352 Proposition 5.1.5 Let F be a Sturmian set and let X d F be a prefix code. 

1353 Then X contains at most one left-special word. If X is a finite F -maximal prefix 

1354 code, it contains exactly one left-special word. 

1355 Proof. Assume on the contrary that x,y £ X are two left-special words. We 

1356 may assume that |a;| < \y\. Let x' be the prefix of y of length |a;|. Then x' is 

1357 left-special and thus x, x' are two left-special words of the same length. This 

1358 implies that x = x' . Thus a; is a prefix of y. Since X is prefix, this implies 

1359 X — y. 

1360 Assume now that A" is a finite i^-maximal prefix code. Let n be the maximal 

1361 length of the words in X. Let u £ F he the left-special word of length n. Since 

1362 XA* is right i^-dense, there is a prefix a; of u which is in X. Thus x is a left- 

1363 special element of X. It is unique by the previous statement. ■ 



A dual of Proposition 3.1.5| holds for suffix codes and right-special words. 



1365 



5.2 Cardinality 



1366 The following result shows that Theorem 4.4.3 can be made much more precise 

1367 for Sturmian sets. 



1368 Theorem 5.2.1 Let F be a Sturmian set on an alphabet with k letters. For 

1369 any finite F -maximal bifix code X d F, one has Card(Ar) = (k — l)dF{X) + 1. 

1370 The following corollary is strong generalization of a result related to Stur- 

1371 mian words. 

1372 Corollary 5.2.2 Let x be a Sturmian word over A = {a,b}, and let X C A~^ 

1373 be a finite maximal bifix code of degree d. Then Card(A' n F{x)) = d + I. 
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1374 Indeed, since A'^ is a finite maximal bifix code of degree d, this corollary 

1375 (re)proves that any Sturmian word x has d+ 1 factors of length d, and it extends 

1376 this to arbitrary finite maximal bifix code of degree d. A similar extension holds 

1377 for strict episturmian words. 



Proof of Corollary 5.2.2. Set F — F{x). In view of Theorem 4.2.11, one has 



d = dpiXnF). Consequently, by the formula of Theorem 5.2.1, Card(XnF) 
dpiX) + l = d+l. 



The proof of Theorem ^.2.1 uses two lemmas 



1382 Lemma 5.2.3 Let F be a Sturmian set. Let X C F be a finite bifix code of 

1383 finite F-degree d and let P be the set of proper prefixes of X . There exists a 

1384 right-special word u Cz F such that 5xiu) = d. The d suffixes of u which are in 

1385 P are the right-special words contained in P. 

1386 Proof. Let ti > 1 be larger than the length of the words of X. By definition, 

1387 there is a right-special word u of length n. Then u is not a factor of a word of 

1388 X. By Theorem 4.2.8 it implies that 6x{u) = dp{X). 

1389 By the dual of Proposition 4.1.3| , the word u has dF{X) suffixes which are 

1390 in P. They are all right-special words. Furthermore, any right-special word p 

1391 contained in P is a suffix of u. Indeed, the suffix of u of the same length than 

1392 p is the unique right-special word of this length. ■ 

1393 The next lemma is a well-known property of trees translated into the lan- 

1394 guage of prefix codes. Let X be a prefix code or the set {1} and let P be the 

1395 set of proper prefixes of X. For p G P, let d{p) = Card{a G A \ pa € P U X}. 



1400 
1401 
1402 
1403 
1404 
1405 
1406 
1407 
1408 



Lemma 5.2.4 Let A be an alphabet with k letters. Let X G A* be a finite 
prefix code or the set {1} and let P be the set of proper prefixes of the words of 
X. Assume that for all p Cz P, d{p) = k or 1. Let Qx — {p P \ d{p) ~ k}. 
Then, Card(X) = (fc - 1) Card(Qx) + 1- 

Proof. Let us prove the property by induction on the maximal length n of 
the words in X. The property is true for n = since in this case X = {1} 
and P — Qx — 0. Assume n > 1. If 1 <^ Qx, then all words of X be- 
gin with the same letter a. We have then X — aY, 1^ is a prefix code or 
the set {1} and Card((5y) = Card(Qx)- Hence, by induction hypothesis 
Card(X) = Card(r) = (fc ~ 1) Card(gY) + 1 = (fc - 1) Card(Qx) + 1- Other- 
wise, X = UaeAaXa. Set ta = Card(Qx„)- We have J2aeA = Card(Qx) - 1- 
By induction hypothesis, Card(Xa) = (fc — l)^^ -I- 1. Therefore, Card(X) = 
EaeA Card(X,) = Eae^C^ - ^% + fc = (fc - 1) Card(Qx) + 1. ■ 



Proof of Theorem 5.2.1. Let P be the set of proper prefixes of X. An element 
p of P satisfies pA C P U X if and only it is right-special. Thus the conclusion 
follows directly by Lemmas 5.2.3 and 5.2.4. ■ 
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code 


XV^± LLr L 


nprivpn pnnp 


CLdbj cihcL^ bcicij bab 





aa, ab, ba 


QjQjj cibcL^ bcicibj bab 


aa 


aaba^ ab, baa, baba 


ab 


aab, abaa, abab, ba 


ba 


aa, ab, baaba, baba 


aa, ab 


aa, abaab, abab, ba 


aa, ba 


aabaa, aababaa, ab, ba 


ab, ba 


a, baabaab, baabab, babaab 


a 


a, baab, bab 


a, baab, babaabaabab, babaabab 


a, baab 


a, baabaab, baababaab, bab 


a, bab 


aaba, abaa, ababa, b 


b 


aa, aba, b 


aa, abaaba, ababa, b 


aa, b 


aabaa, aababa, aba, b 


aba, b 



Table 5.1: The 13 F-maximal bifix codes of F-degree 3 in the Fibonacci set F. 



Example 5.2.5 Let F be the Fibonacci set. We have seen in Example 4.3.13 



that there are 3 i^- maximal bifix codes of F-degree 2. It appears that there are 



13 F-maximal bifix codes of degree 3 listed in Table 5.1. These codes are deter 



1415 mined by their derived F-maximal bifix codes of F-degree 2, and by the choice 



of the kernel. The construction of the code can be done by Theorem 4.3.11. By 
Theorem 5.2.1, all these codes have 4 elements. 




Figure 5.1: The F-maximal bifix code of F-degree 3 with kernel {a, baab}. 



Example 5.2.6 We may illustrate the proof of Theorem 5.2.1 on the code X 



{a, baab, babaabaabab, babaabab} (see Table 5J_). According to Lemma 5.2.5 , the 
right-special word ababaaba (which is the reversal of the prefix abaababa of the 
Fibonacci word) has exactly three suffixes which are proper prefixes of words of 
X, namely 1, ba and babaaba (these are the "fork nodes" , that is the nodes with 
two childred, indicated in black on Figure [3.l| ). This implies, by Lemma 5.2.4, 
that X has four elements. 



The following example shows that Theorem 5.2.1 is not true for the set of 
factors of an episturmian word which is not strict. 
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Example 5.2.7 Set A — {a,6,c}. Let y be the Fibonacci word and let x — 
ipc{y) be the infinite word of Example |2.3.7| . It is an episturmian word which 
is not strict. Set F = F{x). Let ip : A* G he the morphism from A* 
onto the group G = (Z/2Z)3 defined by i>{a) = (1,0,0), = (0,1,0) and 
V'(c) = (0, 0, 1). Let Z be the group code such that Z* = V-'^O, 0, 0). Since G 
has 8 elements, the degree of Z is 8 (see Proposition 3.1. 5| below). The bifix code 
X — Z n F has 10 elements obtained by inserting c in two possible ways in the 



5 words of the bifix code Z (1 F{y). The latt er ha s degree 4 by Theorem 5.2.1 
The bifix code X = Z F is giv en in Figure ^.2| . The numbering of the nodes 
is for later use, in Example 7.2.7. 




Figure 5.2: An _F-maximal bifix code with 10 elements. The numbers in the 
vertices are for later use. 



By Theorem [4. 2. Ill , X is an F- maximal bifix code. Its _F-degree is 8. Indeed, 
the word acbcacbc has 8 parses. Thus Theorem 5.2.1 is not true in this case. 



As a consequence of Theorem 5.2.1, an internal transformation does not 



change the cardinality of a finite _F-maximal bifix code for a Sturmian set F. 
Indeed, by Proposition 4.4.5| , an internal transformation preserves the i^-degree. 

Actually, if Y is obtained from X by internal transformation with respect 
to w, we have 



and 



Y = {XUwU{GwDnF))\{GwUwD) (5.1) 



Card(y) = Card(X) + 1 + Card(Gu;L> D F) ~ Card(G) - Card(i:'). 



The fact that internal transformations preserve the cardinality can be proved 
directly by the following statement. This statement applies to the internal 
transformation ( ^.l[ ) because Gw U wD is a bifix code, which implies property 
(i) and DA* = uT^XA* (resp. A*G — A*Xw^^) which implies property (ii) 
(resp. (iii)). 
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1450 Proposition 5.2.8 Let F be Sturmian set, let w £ F he a nonempty word and 

1451 let D, G he finite sets such that 

1452 (i) any word u has at most one factorization u ~ gwd with g (z G and d ^ D, 

1453 (ii) wD is a prefix code contained in F and DA* is right w^^F-dense, 

1454 (iii) Gw is a suffix code contained in F and A*G is lefi Fw^^ -dense. 

1455 Then G&Td{GwD n F) = Card(G) + Card(i:») - 1. 

1456 Proof. Let V — [1 ® G) U [D ® 1) he a. set made of copies of G and D. The 

1457 tensor product notation is used to emphasize that the copies of G and D are 

1458 disjoint. Let H = {V,E) be the undirected graph having V as set of vertices 

1459 and as edges the pairs {1® g,d® 1} such that gwd £ F (this graph is close to, 

1460 but sUghtly different from the incidence graph for GwD as it will be defined in 

1461 Section |6^ ). We have Card(y) — Card(G) + Card(D) and, by condition (i), 

1462 Card(i?) = Ca,rd{GwD n F). We show that the graph iJ is a tree. This implies 

1463 our conclusion since, in a tree, one has Card(i?) = Card(T^) — 1. 

1464 Let us prove that the graph _ff is a tree by induction on the sum of the 

1465 lengths of the words of D, assuming that the pair G, D satisfies conditions (ii) 

1466 and (iii). Assume first that D = {1}. Since Gw C F, one has GwD C F. 

1467 Consequently, {1 g,l 1} £ E for any g £ G. Thus if is a tree. 

1468 Next, assume that D ^ {1}. Let d S I? be of maximal length. Set d — d'a 

1469 with a £ A. 

1470 Suppose first that d'AnD = {d}. Let D' ^ {D U d') \ d. Since DA* is 

1471 w^^F-Aense, the word wd' is not right-special. Thus for each g £ G, we have 

1472 gwd' £ F ii and only if gwd £ F. This shows that the graph H is isomorphic to 

1473 the graph H' corresponding to the pair {G,D'). The set D' satisfies condition 

1474 (ii). By induction H' is a tree. Consequently H is a tree. 

1475 Suppose next that d'A n D has more than one element. Then d' is right- 

1476 special and d'A nD = d'A. Let D' = (DU d') \ d'A. Then D' satisfies condition 

1477 (ii). Let H' be the graph corresponding to the pair {G,D'). By induction 

1478 hypothesis, the graph H' is tree. Since wD C F, wd' is right-special. Let uwd' 

1479 be a right-special word such that u is longer than any word of G. Since A*G 

1480 is Fw~^-dense, and since u £ Fw~^, u has a suffix g in G. Thus gwd' is right- 

1481 special. We have {1 ® g,d'a ® 1} £ E for all a £ A. For any other element 

1482 g' £ G such that g'wd' £ F, since g'wd' is not right-special, there is exactly one 

1483 a' £ A such that g'wd' a' £ F . There is a path between g and every g' ^ 

1484 since {1 (8) g^ 1 ® d'a'} £ E for some a' and {1 (g) 5, 1 (X) d'a'} £ E for all a' 

1485 (see Figure ^.3[ ). Thus the graph H is connected and acyclic, and therefore is a 

1486 tree. ■ 



1487 The following example shows that condition (i) is necessary. 

1488 Example 5.2.9 Let F be the Fibonacci set. Let G = {ah, aba}, w ~ a and 

1489 D = {ab,b}. Then conditions (ii) and (iii) are satisfied but not condition (i). 

1490 We have GwD = {abaab, abab} and thus the conclusion of Proposition ^.2.S is 

1491 false. 
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G D' G D 

Figure 5.3: The graphs H' and H. 



1492 Example 5.2.10 Let F be the Fibonacci set and let X = {aaba,abaa,abab, 

1493 baab, baba} be the set of words of F of length 4. The internal transformation 

1494 from X relative to the word w = aba gives Y = {aabaa, aabab, aba, baab, babaa}. 

1495 We have G — D = {a,b}. The codes X, Y and the graph H of the proof of 




Figure 5.4: The codes X,Y and the graph H. 



1497 5.3 Periodicity 

1498 Let X = aoOi • • • , with G A, be an infinite word. It is periodic if there is an 

1499 integer n > 1 such that ai+„ = for all i > 0. It is ultimately periodic if the 

1500 equalities hold for all i large enough. Thus, x is ultimately periodic if there is 

1501 a word u and a periodic infinite word y such that x = uy. The following result, 

1502 due to Coven and Hedlund, is well-known (see [Q, Theorem 1.3.13). 

1503 Theorem 5.3.1 Let x E be an infinite word on an alphabet with k letters. 

1504 // there exists an integer d > 1 such that x has at most d + k — 2 factors of 

1505 length d then x is ultimately periodic. 

1506 We will prove a generalization of this result. 

1507 Let X be an infinite word and let X be a thin maximal bifix code. Let u be 

1508 a prefix of x and set x — uy. Since F{y) C F{x), one has dF(y){X) < dp(^x)iX). 

1509 The word x is called X-stable if dF(^y^{X) — dp(^x){X) for all suffixes y of x. Let 

1510 u be a prefix of x such that <i_F(j,) (X) is minimal. Then the infinite word y is 

1511 X-stable. 
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1512 For example, if a; = 5a" and X ~ aUba*b, then an X-stable suffix of x is a'^ 



1513 
1514 



Theorem 5.3.2 Let X be a thin maximal bifix code and let x G be an 

X-stable infinite word. If Card(X H F{x)) < dp(^x^{X), then x is ultimately 

1515 periodic. 

1516 Corollary 5.3.3 Letx G be an infinite word. If there exists a finite maximal 

1517 bifix code X of degree d such that Card(X n F(x)) < d, then x is ultimately 

1518 periodic. 



1519 
1520 



1521 
1522 



Proof. Since any long enough word has d parses, dp(^x) {X) — d a nd x i s X-stable 



Since Card(X n F{x)) < d, the conclusion follows by Theorem 5.3.2 



Corollary ^.3.3 implies Theorem 5.3.1 in the case k — 2 since A"^ is a maximal 



bifix code of degree d. 

Example 5.3.4 Let us consider again the finite maximal bifix code X of de- 
gree 3 defined by X — {a'^,a^ba,a^b^,ab,ba^,baba,bab^,b^a,b^} (see Exam- 
ple 4. 2. 121 ). Assume that X f] F — {a'^ba, ab,baba}, where F = F{x) and 



X G Since ba is a factor of x, there exist a word u and an infinite word y 
such that X = ubay. Next, the first letter of y is 6 (otherwise, ba^ G X n F) and 
the second letter of y is a (otherwise, bab^ G Xr\F). This argument shows that 
whenever uba is a prefix of x then ubaba is also a prefix of x, i.e., x — u{ba)^ , 
with u e A*. 

Example 5.3.5 The set X = o U ba*h is a maximal bifix code of degree 2. 
An argument similar to the previous one shows that any infinite word x G 
such that X n F{x) — {a, bob} belongs to the set a* (6a)". Thus it is ultimately 
periodic. 

Corollary 5.3.6 Let x G A^ be an infinite word and let X be a thin maximal 
bifix code. Let y be an X-stable suffix of x and let F = F{y). // Card(A' H -F) < 
dplX), then x is ultimately periodic. 



Proof. By Theorem 5.3.2, the word y is ultimately periodic, and so is x. 



The following example shows that Corollary 5.3.6 may become false if we 
replace F — F{y) hy F = F{x) in the statement. 

Example 5.3.7 Let X be the maximal bifix code of degree 4 on the alphabet 
A — {a, b, c} with kernel K — {a, b}^. 

Let X — ccay where y is an infinite word without any occurrence of c. Then 
cca has no factor in X. Indeed, a word of X of length at most 3 is in the kernel 
of X and thus is not a factor of cca. Thus cca has 4 parses with respect to X, 
namely (1, 1, cca), (c, 1, ca), (cc, 1, a) and (cca, 1, 1). Thus we have dF(x){X) = 4. 
On the other hand X n F{x) C {a, 6}^ and thus Card(X n F{x)) < dF(x){x) 
although X need not be ultimately periodic. This shows that we cannot replace 
F{y) by F{x) in the statement of Corollary ^.3.6| . 
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1550 The proof uses the Critical Factorization Theorem (see |T^) that we 

1551 recall below. For a pair of words {p,s) ^ (1,1), consider the set of nonempty 

1552 words r such that 

1553 This is the set of nonempty words r which are prefix-comparable with s and 

1554 suffix-comparable with p. This set is nonempty since it contains r — sp. The 

1555 repetition rep(p, s) is the minimal length of such a nonempty word r. 

1556 Let w = aia2 ■ ■ ■ am be a word with G A. An integer n > 1 is a period of 

1557 w if for 1 < i < j < m, j — i = n implies = aj. Recall that a factorization of 

1558 a word if e A* is a pair {p, s) of words such that w — ps. 

1559 Theorem 5.3.8 (Critical Factorization Theorem) For any word w G A^ , 

1560 the maximal value ofiepip, s) for all factorizations {p, s) ofw is the least period 

1561 of w . 

1562 We will also use the following lemma. 

1563 Lemma 5.3.9 Let x he an infinite word and n > I be an integer such that 

1564 the least period of an infinite number of prefixes of x is at most n. Then x is 

1565 periodic. 

1566 Proof. Since the least period of an infinite number of prefixes of x is at most n, 

1567 an infinity of them have the same least period. Let p be such that an infinite 

1568 number of prefixes of x have least period p. Set x — a^ai ■ ■ ■ with Oi G A. For 

1569 each i > 0, there is a prefix of x of length larger than i + p with least period p. 

1570 Thus Oi = Oi+p. This shows that x is periodic. ■ 

1571 Proof of Theorem [3.3. 2| . 

Let S = A*\ A*X and P ^ A*\ XA*. Set F = F{x) and d = dpiX). Since 

1573 Card(X n F) < dpiX) < d{X), the set X n F is finite. Since x is X-stable, 

1574 there are an infinite number of factors and therefore also of prefixes of x which 

1575 have d parses with respect to X. Indeed, for any factorization x = uy, we have 

1576 dp[y){X) — d and thus y has a factor which has d parses, so it has a prefix w 

1577 with d parses, and finally uw is a prefix of x with d parses. 

1578 Let n be the maximal length of the words in Xf]F . Let u be a prefix of x of 

1579 length larger than n which has d parses and set x = uy. Let w be a nonempty 

1580 prefix of y and set y = wz. Let u be a prefix of z of length larger than n which 

1581 has d parses. 

1582 Let {p, s) be a factorization of w. We show that rep(p, s) < n. 

1583 Since up has d parses with respect to X, there are d suffixes pi^p2^ . . . ,pd of 

1584 up which are in P. We may assume that pi = 1. Similarly, there are d prefixes 

1585 si, S2, ■ ■ . , Sd of sv which are in S. We may assume that si = 1. 

1586 Since upsv has d parses, for each pi with 2 < i < d there is exactly one Sj 

1587 with 2 < j < d such that piSj G X. Indeed, there is a prefix s' of sv such that 

1588 pi-s' G X. Since s' must be one of the Sj, the conclusion follows. 



46 



We may renumber the Si in such a way that piSi S X for 2 < i < d. Set 
Xi = PiSi. Since up ^ 5, we have up G A*X. Let be the word of X which 
is a suffix of up. Similarly, let Xi be the word of X which is a prefix of sv (see 



Figure 5.5) 



O 



o- 



Xo Xl 

Figure 5.5: The d + 1 words xq, cci, . . . , Xd- 



-O 




Since Card(X H F) < d, two of the d + 1 words xq, xi, . . . ,Xd are equal. 
If Xq = Xl, then rep(p, s) < n. 

If Xq = Xi for an index i with 2 < i < d, then is a suffix of up (since it is a 
suffix of Xq) and a prefix of sv (by definition of s^). Furthermore \si\ < n (since 
n is the maximal length of the words oi X n F). Thus rep(p, s) < \si\ < n. 

The case where Xi = x\ for an index i with 2 < i < c? is similar. 



Assume finally that xi 



for some indices i,j such that 2 < i < j < d. 



We may assume that \pi\ < \pj\. Thus pj — pit, tsj = Si. As a consequence, t 
is both a suffix of up (since it is a suffix of pj ) and a prefix of sv (since it is a 
prefix of s;). Thus again, rep(p, s) < \t\ < n. 

By the Critical Factorization Theorem, this implies that the least period of 
w is at most equal to n. Thus an infinite number of prefixes of y have least 
period at most n. By Lemma 5.3.9, it implies that y is periodic. 



6 Bases of subgroups 

In this section, we push further the study of bifix codes in Sturmian sets. The 



main result of Section 6.2 is Theorem 6.2.1. It states that a i^-maximal bifix 



code X G F oi F-degree d is a basis of a subgroup of index d of the free group 
on A. The proof uses two sets of preliminary results. The first part concerns 
bases of subgroups composed of words over A, already considered in Q. The 
second one uses the first return words, which were introduced independently 
in [|3), and which we use in the framework of ||3^ and Q, up to a left- 
right symmetry (see also Q). 

We denote by A° the free group generated by A. The rank of A° is Card(A). 
Note that all sets generating a free group of rank k have at least k elements. 
A basis is a minimal generating set. All bases have exactly k elements (see 
e.g. 13). 

Let iJ be a subgroup of rank n and of index d of a free group of rank k. 
Then 

n = d(fc-l) + l. (6.1) 
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1622 Formula ( |6.1[ ) is called Schreier's Formula. 

1623 The free monoid A* is viewed as embedded in A°. An element of the free 

1624 group is represented by its unique reduced word on the alphabet A U A~^ . The 

1625 elements of the free monoid A* are themselves reduced words since they do not 

1626 contain any letter in A~^. Thus A* is a submonoid of A°. The subgroup of A° 

1627 generated by a subset X of A° is denoted {X). 

1628 In any group G, the right cosets of a subgroup H are the sets of the form 

1629 Hg for g € G. Two right cosets of the same subgroup are disjoint or equal. The 

1630 index of a subgroup is the number of its distinct right cosets. If is a subgroup 

1631 of the subgroup H, then the index of K in G is the product of the index of K 

1632 in H and of the index of H in G. If H, K are two subgroups of index d of a 

1633 group G, then H C K implies H = K. 

1634 Assume now that G is a group of permutations over a set Q. For any q in 

1635 Q, the set of elements of G that fixes <? is a subgroup of G. 

1636 The group G is transitive if, for all p, g € Q, there is an element g £ G 

1637 such that pg = q. In this case, the subgroup H of permutations fixing a given 

1638 element p of Q has index Card(Q). Indeed, for each q e Q let gq be an element 

1639 of G such that pgq — q. If g e G is such that pg = q, then pggg^ — p and 

1640 consequently gg^^ G H, whence g S Hgq. Thus each g G G is in one of the 

1641 right cosets Hgq, for q G Q. Since these right cosets are pairwise disjoint, the 

1642 index of H is Card((5). 

1643 6.1 Group automata 

1644 A simple automaton A — (Q, 1,1) is said to be reversible if for any a G A, the 

1645 partial map 'y3>t(a) : p ^ p ■ a is injective. This condition allows to construct 

1646 the reversal of the automaton as follows: whenever q - a = p\M A, then p - a = q 

1647 in the reversal automaton. The state 1 is the initial and the unique terminal 

1648 state of this automaton. Any reversible automaton is minimal p8| . The set 

1649 recognized by a reversible automaton is a left and right unitary submonoid. 

1650 Thus it is generated by a bifix code. 

1651 An automaton A = (Q, 1, 1) is a group automaton if for any a €z A the map 

1652 1^9^(0) : p I— > p • a is a permutation of Q. When Q is finite, a group automaton 

1653 is a reversible automaton which is complete. 

1654 The following result is from (see also Exercise 6.1.2 in [^). 

1655 Proposition 6.1.1 Let X C ^4+ be a bifix code. Ttie following conditions are 

1656 equivalent. 

(i) X* = {X) n A*; 

1658 (ii) the minimal automaton of X* is reversible. 

1659 Let A — {Q,i,T) be a deterministic automaton. A generalized path is a 

1660 sequence (po, ■ • ■ jPn-ii QruPn) with € AU A~^ and Pi G Q, such 

1661 that for 1 < i < ri, one has Pi-i ■ Gi = Pi if Oi £ A and Pi-a~^ = Pi-i if e A^^ . 

1662 The label of the generalized path is the element 0102 ■ ■ • a„ of the free group A° . 
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1663 Note that if ^ = (Q, 1, 1), the set of labels of generalized paths from 1 to 1 in 

1664 ^ is a subgroup of A°. It is called the subgroup described by A. 

1665 A path in an automaton is a particular case of a generalized path. In the 

1666 case where A has a unique terminal state which is equal to the initial state, the 

1667 submonoid of A* recognized by A is contained in the subgroup of A° described 

1668 by A. 

1669 Example 6.1.2 Let A — {Q,l,l) be the automaton defined by Q = {l^S}, 

1670 1 ■ a = 1 ■ b ~ 2 and 2 • a = 2 • = 0. The submonoid recognized by A is {!}. 

1671 The subgroup described by A is the cyclic group generated by ab~^. 

1672 Proposition 6.1.3 Let A be a simple automaton and let X be the prefix code 

1673 generating the submonoid recognized by A. The subgroup H described by A is 

1674 generated by X . If moreover A is reversible, then X* = H H A* . 

1675 Proof. Set A — ((5,1,1). Let K be the subgroup described by A. Let us 

1676 show that K — H. First, X C K implies {X) = H C K. To prove the 

1677 converse inclusion, let (po, ai,pi, 02, . . . ,Pn-i, o.n,Pn) be a generalized path with 

1678 Qi G AU A~^, Pi & Q and po = Pn = I- Let h G A° he the label of the path. 

1679 Let r be the number of indices i such that e A~^. We show by induction 

1680 on r that h G H. This holds clearly if r = 0. Assume that it is true for r — 1. 

1681 Let i be the least index such that a.; G A~^. Set u = ai-- -Oi^i, a — a^^, 

1682 V = fli+i • • • a„ in such a way that h — ua~^v. Set also p — pi^i and q = pi. 

1683 Thus 1 ■ u ^ p, q ■ a = p and v is the label of a generalized path from q to 1. 

1684 Since A is trim there exist words w,t € A* such that p ■ t = 1 and I ■ w — q. 

1685 Since 1 ■ ut — 1 ■ mat — 1 (see Figure |6.lD , we have ut, wat € X* . By induction 

1686 hypothesis, since wv is the label of a generalized path from 1 to 1, we have 

1687 wv G H. Then ua~^v = utt~^a~^w~^wv — ut{wat)~^wv is in H and thus 

1688 h G H . 

1689 Assume now that A is reversible. Then is minimal and, by Proposition |6 . 1 . l| , 

1690 one has X* = H f] A* . m 




V 



Figure 6.1: Paths in the automaton A. The generalized path is dashed. 

1691 For any subgroup H oi A° , the submonoid H O A* is right and left unitary. 

1692 Thus H O A* is generated by a bifix code. A subgroup H of A° is positively 

1693 generated if there is a set X C A* which generates H. In this case, the set 

1694 HnA* generates the subgroup H. Let X be the bifix code which generates the 

1695 submonoid H D A* . Then X generates the subgroup H. This shows that, for a 

1696 positively generated subgroup H, there is a bifix code which generates H. 
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Proposition 6.1.4 For any positively generated subgroup H of A° , there is a 
unique reversible automaton A such that H is the subgroup described by A. 

Proof. Let X be the bifix code generating the submonoid H O A* , so that 
X* = H DA* . Since H is positively generated, the subgroup generated by X is 
equal to H, that is {X) = H. Thus X* = {X)nA*. In view of Proposition |6.1.1| , 
the minimal automaton A of X* is rever sible. Thus the submonoid recognized 
by ^ is n ^* and by Proposition |6. 1.3 , H is the subgroup described by A. 



If B is another reversi ble au tomaton such that H is the subgroup described 
by B, then by Proposition |6 . 1 . l| , B recognizes the set HOA* . Since B is minimal 
and since minimal automata are unique, the uniqueness follows. ■ 

The reversible automaton A such that H is the subgroup described by A is 
called the Stallings automaton of the subgroup H . It can also be defined for a 
subgroup which is not positively generated (see ||^ or Q). 

Proposition 6.1.5 The following conditions are equivalent for a submonoid M 
of A*. 

(i) M is recognized by a group automaton with d states. 

(ii) M — ip~^{K), where K is a subgroup of index d of a group G and if is a 
surjective morphism from A* onto G. 

(iii) AI — H f] A* , where H is a subgroup of index d of A° . 

If one of these conditions holds, the minimal generating set of M is a maximal 
bifix code of degree d. 

Proof, (i) implies (ii). Let A = (Q, 1, 1) be a group automaton with d states 
and let M be the set recognized by A. Since a composition of permutations is 
a permutation, the monoid G = ^fij,[A*) is a permutation group. Since A is 
trim, there is a path from every state q to any state q' in A. Consequently, G 
is transitive. Let K be the subgroup of G formed of the permutations fixing 1. 
As we have seen earlier, K has index d. Then M — Lp^{K). 

(ii) implies (iii). Let ip be the morphism from A° onto G extending ip. Then 
H = 7p~'^{K) is a subgroup of index d of A° and M = H r\ A* . 

(iii) implies (i). Let Q be the set of right cosets of H with 1 denoting the 
right coset H . The representation of A° by permutations on Q defines a group 
automaton A with d states and M is recognized by A. 

Finally, let X be the minimal generating set of a submonoid M satisfying 
one of these conditions. It is clearly a bifix code. Let P be the set of proper 
prefixes of X. The number of suffixes of a word which are in P is at most equal 
to d. Indeed, let A = {Q, 1, 1) be a group automaton recognizing X* . If s, t are 
distinct suffixes of a word w which are in P, then 1 • s 7^ \ -t. Indeed, otherwise, 
since s and t are suffix-comparable, we may assume that s = ut. Let p = \ ■ u. 
Then p-t — 1 ■ ut — 1- s — 1 -t and thus p = 1 since A is reversible. Thus s ^ t. 
Let ui be a word with the maximal number of suffixes in P. Then w cannot 
be an internal factor of X. Moreover the number of suffixes of w in P is equal 
to d. Indeed, since .4 is a group code, for any q Cz Q, there is a state q' such 
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1739 that q' ■ w = q. Since w is not an internal factor of X, there is a factorization 

1740 w = sxp such that q' ■ s = 1, 1 ■ x — 1 and 1 ■ p = q, and such that (s, x,p) is a 

1741 parse of w. Thus X has degree d. ■ 

1742 A bifix code Z such that Z* satisfies one of the equivalent conditions of the 



1743 proposition 6.1.5 is called a group code. 



1744 The following proposition shows in particular that a subgroup of finite index 

1745 is positively generated. 

1746 Proposition 6.1.6 Let H be a subgroup of finite index of A° . The minimal 

1747 automaton A of H H A* is a group automaton which describes the subgroup H. 

1748 Let X he the group code such that A recognizes X* . The subgroup generated by 

1749 X is H . 



Proof. By Proposition |6.1.5 , the monoid H f] A* is recognized by a group 
automaton A = (Q, 1, 1), which is the minimal automaton of _ff H A* . The mor- 
phism from A* onto the group G — lpa{A*) of Proposition 6.1.5 extends to a 
morphism ip from A° onto G. The subgroup K is composed of the permutations 
that fix 1, and the subgroup H is formed of the elements w € A° such that the 
permutation ■0(w) fixes 1. There is a generalized path in A from p to q labeled 
w if and only \ip^l)(w) — q. Thus ip{w) fixes 1 if and only there is a generalized 
path from 1 to 1 labele d w, t hat is if w G iJ. Thus the subgroup described by 
A is H . By Proposition 3.1.3| , the subgroup H is generated by X. ■ 



Example 6.1.7 The set A'^ is a group code by Proposition 6.1.5(ii). Thus it is 
a maximal bifix code of degree d. The intersection of the subgroup generated 
by A''- with A* is the submonoid generated by A''- (Proposition 3.1.6). It is 
composed of the words with length a multiple of d. 



1763 6.2 Main result 

1764 We will prove the following result. 



1765 Theorem 6.2.1 Let F be a Sturmian set and let d > 1 be an integer. A bifix 

1766 code X C F is a basis of a subgroup of index d of A° if and only if it is a finite 

1767 F-maximal bifix code of F- degree d. 



Note that Theorem 5.2.1 is contained in Theorem 6.2.1 (we will use Theo- 
rem 5.2.1 in the proof of Theorem 3.2.1). Indeed, let X be an _F-maximal bifix 



code of i^-degree d. By Theorem 4.4.3, X is finite. By Theorem 3.2.1, the sub 



1771 group {X ) ha s rank Card(X) and index d in the free group A° . By Schreier's 

1772 Formula (|I1), one get Card (X) ~ (Card(A) - l)d + 1. 

1773 Before proving Theorem |6.2.l| , we list some corollaries. 

1774 Corollary 6.2.2 Let F be a Sturmian set. For any d> 1, the set of words in 

1775 F of length d is a basis of the subgroup of A° generated by A'^. 
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Proof. The set A''' is a group code (see Example 3.1.7), and therefore is a 



maximal bifix code. The set n F is a finite bifix code. By Theorem 4.2.11 



it is an i^-maximal bifix code and has -F-degree d. The corollary follows from 
Theorem |6.2.1[ . Indeed {A'^ n F) = {A'^) since both are subgroups of A° of 
index d. m 



The following is also a complement to Theorem 4.2.11. It shows in particular 



that for any Sturmian set F, any subgroup of A° of finite index has a basis con- 
tained in F. Note that this This provides contains the fact that every subgroup 



of finite index has a positive basis, see also Proposition 6.1.6 



Corollary 6.2.3 Let F be a Sturmian set. The map which associates to X <Z F 
the subgroup {X) of A° generated by X is a bijection between F -maximal bifix 
codes of F -degree d and subgroups of A° of index d. Such a bifix code X is a 
basis of {X) . The reciprocal bijection associates, to a subgroup H of A° , the 
set Z n F where Z is the group code which is the minimal generating set of the 
submonoid H H A* of A* . 

Proof. Let first X be a finite F-maximal bifix code of F-degree d. Then {X) is 



a subgroup of index d by Theorem 6.2.1 



Conversely, let H he a, subgroup of inde x d of A ° and let Z be the group 
code such that Z* = H n A* . By Theorem |4.2.1l| , the set X = Z n F is an 
F-maxim al bifi x code of F-degree e < d. By Theorem 4.4.3 , X is finite. By 
Theorem 6.2.1, the subgroup {X) has index e. Since {X) is a subgroup of H, e 
is a multiple of d. Thus d — e and {X) — H . 

Finally, let X be an F-maximal bifix code of F-degree d. Then H = {X) is 
a subgroup of index d of A°. Let Z be the group code such that Z* — H n A* 
and let Y — Z D F. Then X C Y and thus X = Y since X is an F-maximal 
bifix code. This shows that the two maps are mutually inverse. 



1813 
1814 



A set W of words of {a, b}* is balanced if for all w, w' € W, \w\ — \w'\ implies 
llw'la — |w'|a| < 1. It is a classical property that the set of factors of a Sturmian 
word is balanced (Theorem 2.1.5 in js^). Thus any Sturmian set on two letters 
is balanced. 

Following Richomme and Seebold |5^, we say that a subset X of {a,b}* is 
factorially balanced if the set of factors of words of X is balanced. They show 
that a finite set X C {a, b}* is contained in some Sturmian set if and only if it is 



factorially balanced. Thus, we have the following consequence of Theorem 6.2.1 



1811 Corollary 6.2.4 Let X C {a, b}* be a bifix code. The following conditions are 

1812 equivalent. 

(i) There exists a Sturmian set F C {a, b}* such that X <Z F and X is a 
finite F-maximal bifix code. 
1815 (ii) X is a factorially balanced basis of a .subgroup of finite index of{a,b}°. 



As a further consequence of Theorem |6.2.1| , we have the following result 
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Corollary 6.2.5 Let F be a Sturmian set on an alphabet with k letters. The 
number Nd,k of finite F-maximal bifix codes X d F of F-degree d satisfies 
Ni^k — 1 and 



N, 



i.k ■ 
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1821 
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The formula results directly from the formula, due to Hall p7| ] , for the number 
of subgroups of index d in a free group of rank k. 
The values for k = 2 are given by the recurrence 



Nd,2 = ddl-J2id-iyM,2- 



1824 
1825 
1826 



The first values are 



d 


1 2 


3 


4 


5 


6 


7 


8 9 


10 


Nd,2 


1 3 


13 


71 


461 


3447 


29093 


273343 2829325 


31998903 



The values for d = 2, 3 are consistent with Examples 4.3.13 and 5.2.6| . 

The formula is known to enumerate also the indecomposable permutations 
on d + 1 elements (see pli, || and [p|). 



6.3 Incidence graph 

Let X be a set, let P be the set of its proper prefixes and S be the set of its 
proper suffixes. Set P' = P\l and S' = S\l. The incidence graph of X is the 
undirected graph G defined as follows. The set of vertices is V — 1 (X) P' U S" (g) 1. 
Similarly to the proof of Proposition \).2.i , the tensor product notation is used 
to emphasize that V is made of two disjoint copies of P' and S' . The edges of 
G are the pairs {1 (Xi p, s (g) 1}, for p e P' and s £ S', such that ps e X. 

Let C be a connected component of G, that is a maximal set of vertices 
connected by paths. The trace of C on P' is the set oi p G P' such that 
1 (E) p € C. Similarly, the trace of C on S' is the set of s G 5" such that 
s (g) 1 e C. 





a ^ a ^ b ^ a 




Figure 6.2: The F-maximal bifix code of F-degree 3 with kernel {a, baab}. 



Example 6.3.1 Consider the F-maximal bifix code of F-degree 3 in the Fi- 
bonacci set F given in Figure 3.2, It is a colored copy of Figure 5.1. The 



53 



incidence graph of X is given in Figure 6.3. It has two connected components 
colored red and blue. The vertices on the left side are the 1 €5 p (written simply 
p for convenience) . The vertices on the right side are the s €5 1 with the same 
convention. 

The color on the node in Figure S.2 corresponds to the color of the corre- 
sponding prefix in Figure 6.3. 




Figure 6.3: The incidence graph of X. 



The following lemma uses an argument similar to Lemma 5.2.4 



Lemma 6.3.2 Let wi, W2, . . . , Un+i be words such that Vi^Vi+i are not prefix- 
comparable for 1 < i < n. Let pi be the longest common prefix of Vi,Vi^i, for 
1 < i < n. If two of the Vi are prefix- comparable, then two of the pi are equal. 

Proof. Let V — {wi, . . . , Wn+i}, let P be the set of proper prefixes of V and let 
W = V\P. The set W is the set of words of V which have no proper prefix in 
V . The set is a prefix code. If two distinct words in V are prefix-comparable, 
then Card(VF) < Card(V") <n + l. 

Let m be the number of distinct pi. Since Ui, w^+i are not prefix-comparable 
for 1 < i < n, for each pi there are at least two distinct letters a, 6 such that 
PiQjPib £ PUW. This implies m < Card(W^). Indeed, the set W can be seen 
as the set of leaves in a tree, and each pi is a fork node (i. e. a node with at 
least two children) in this tree. It is well-known that the number of fork nodes 
is strictly less than the number of leaves. If two of the Vi are prefix-comparable, 
the inequality Card(M^) < n 1 implies m < Card(W^) < n, and consequently 
two of the Pi are equal. ■ 
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Lemma 6.3.3 Let F be a Sturmian set and let X G F be a bifix code. Let P' 
(resp. S' ) be the set of nonempty proper prefixes (resp. suffixes) of X and let 
G be the incidence graph of X. 

(i) The graph G is acyclic, that is a union of trees. 

(ii) The trace on P' ( resp. on S' ) of a connected component G of G is a suffix 
(resp. prefix) code. 

(iii) In particular, this trace on P' (resp. on S' ) contains at most one right- 
special (resp. left-special) word. 



Proof. The last assertion follows from the second by Proposition 5.1.E . We call 
a path reduced if it does not use equal consecutive edges. 

We prove by induction on n > 1 that if s 1 and t (E) 1 (resp. 1 (E) p and 
I (E) q) are connected by a reduced path of length 2n in G, then s,t are not 
prefix-comparable (resp. p,q are not suffix-comparable). This shows that G is 
acyclic. Indeed, if there were a cycle from s to i = s in G, then s and t would 
be prefix-comparable. This shows also that two words in the same trace on P' 
(resp. on S') are not suffix-comparable (resp. are not prefix-comparable). 



Figure 6.4: A path (s (8) 1, 1 (8) q, t 1) on the left, and a path of length 2n on 
the right. 

The property holds for n = 1. Indeed, a reduced path of length 2 from sE)! 
to t 8) 1 is of the form {s E) 1,1 E) q,t E) 1) with qs, qt G X. Since the path is 
reduced, s ^ t, and since X is prefix, s and t are not prefix-comparable, see 
Figure |6.4 The proof for prefixes is similar. 

Let n > 2. A path of length 2n from s(g)lto<(8)lisa sequence (v\Eil,l® 
u\,V2E)l, . . . ,1® Un, w„+i E) 1) with s = v\ and t — such that the In words 
defined for 1 < i < ?i by 

are in X. Moreover, since the path is reduced, one has xj ^ a^j+i for 1 < j < 2n. 

For 1 < i < n, let pi be the longest common prefix of Vi,Vi^i. Since 
2^21-1 7^ X2i and since the code X is prefix, the words Vi and Vi+i are not 
prefix-comparable. 

Arguing by contradiction, assume that vi and Vn+i are prefix-comparable. 



By Lemma 3.3.2, we have pi = pj for some indices i,j with 1 < i < j < n. 

Set Vi = piv[ and Vi+i =piv'/. Since Vi,Vi+i are not prefix-comparable, the 
words v'^ , v'/ are nonempty. Since their longest common prefix is empty, their 
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1898 
1899 
1900 

1901 
1902 
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initial letters are distinct. Thus UiPi is right-special. Similarly UjPj is right- 
special. Thus UiPi and UjPj are sufRx-comparable. Since pi = pj, Ui and 
suffix-comparable. 



Ui are 



But l®Ui and 1 ® uj are connected by the path (1 (g) w 



1, 



,V-i ® 



1, 1 i^i Uj) of length 2(j — i) < 2{n — 1). By the induction hypothesis, Ui and Uj 
are not suffix-comparable, a contradiction. 

The proof that if l(S)p and 1(E) q are connected by a path of length 2n in G, 
then p, g are not suffix-comparable is similar. ■ 

Let X be a bifix code and let P be the set of proper prefixes of X. Consider 
the equivalence dx on P which is the transitive closure of the relation formed 
by the pairs p,q E P such that ps,qs G X for some s G . Such a pair 
corresponds, when p, q 7^ 1, to a path {l®p,s®l^l®q) in the incidence graph 
of X. Thus a class of Ox is either reduced to the empty word or it is the trace 
on P \ 1 of a connected component of the incidence graph of X. 



1909 

1910 
1911 



Example 6.3.4 Consider the code X of Example above. The three classes 
of Ox are the class {1} of the empty word, and the two suffix codes which are the 
traces of connected components of the incidence graph on the set of nonempty 
proper prefixes of X. These codes are {babaabaaba, babaaba, baba, baa, b} and 
{babaabaab, babaabaa, babaab, babaa, bab, 6a}. They are shown in Figure pTs]. 




Figure 6.5: The two suffix codes which are classes of the equivalence Ox- 

The following property relates the equivalence Ox with the right cosets of 
H^{X). 



1914 Proposition 6.3.5 Let X be a bifix code, let P be the set of proper prefixes of 

1915 X and let H be the subgroup generated by X . For any p,q ^ P , p = q mod Ox 

1916 implies Hp = Hq. 
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1923 



1924 



Proof. Since p = q mod 9x, there is a path from 1 (g) p to 1 (g) g in the incidence 
graph of X of length 2n, for some n > 0. If n = 1, There is a word s G 
such that ps, qs G X . Then p,q G Hs^^ and thus Hp = Hq. The general case 
follows by induction. ■ 



Let A = (P, 1, 1) be the literal automaton of X* (see Section 3.2). We show 
that the equivalence Ox is compatible with the transitions of the automaton A 
in the following sense. 

Lemma 6.3.6 Let F he a Sturmian set. Let X G F be a bifix code and let P 
be the set of proper prefixes of X . Let p,q & P and a Cz A. If p = q mod 6x and 
if p ■ a,q ■ a ^ in the literal automaton of X* , then p ■ a = q ■ a mod 9x. 

Proof. Let G be the incidence graph of X. 

Let p,q Cz P and a G ^ be such that p = q mod 9x and p-a, q-a ^ $. If p = 1, 
then q — 1 and the conclusion holds. Thus we may assume that p 7^ l,q 7^ 1, 
and that p ^ q. Let (lig)Uo, (8) 1, l(8)ui, . . . , w„(8) 1, be a path in G with 

p = uq, Un = q. The corresponding words in X are uqVi,uiVi, U1V2, . . . , UnVn- 
We may assume that the words Ui are pairwise distinct, and that the Vi are 
pairwise distinct. Moreover, since p ■ a^q - a ^ ^ there exist words v, w such that 
pav, qaw G X. 

The proof is in two steps. In the first step, we assume that vi and w„ both 
start with a. In the second step, we show that this condition is always fulfilled. 

Assume that vi and u„ begin with a. There are two cases. 

Case 1: Assume first that pa, qa e P. Then p ■ a = pa and q ■ a = qa. If 
all words Vi begin with a, then clearly the equivalence pa = qa mod Ox holds. 
Thus assume the contrary, and let i > 1 be minimal such that Vi begins with 
a letter distinct of a and let i < j < n be maximal such that Vj begins with 
a letter distinct of a. Then both words and Uj are right-special (since 

Ui-iVi-i starts with Ui^ia and Ui-iVi starts with Ui-ib for some letter b ^ a 
and similarly for Uj). But since Ui-i and U j are in the same trace on P' of 
a connected component of G, Lemma |6.3.3 implies that = uj, that is 



i ^ 1 — j. But this contradicts the inequality i < j. 

Case 2: Suppose now that pa G X. This implies that vi = a, since pvi = 
uqVi is in X and begins with pa. If Vn = aw., then since vi and w„ , if they are 



distinct, are not prefix-comparable by Lemma 6.3.3, one n = 1 and w — 1. If 



Vn 7^ aw, then (wi ® 1, 1 (g) mi, . . . , (8) 1, 1 (8) u„, aw 1) is a path from wi (g) 1 



to aw iSi 1 (recall that Unaw = qaw € X). Lemma 3.3.S implies that vi — a and 
aw, if they are distinct, are not prefix-comparable. Thus, one has again w = 1. 
In both cases, qa G X and therefore p • a = 1 = q ■ a. 

We now show that the assumption that vi begins with a letter distinct of a 
leads to a contradiction (the case where Vn starts with a letter distinct from a 
is handled symmetrically). In this case since uqVi is in X and uoav = pav S X, 
the word uq is right-special. Let i be the largest integer such that Vi begins 
with a letter distinct of a for 1 < i < n. li i < n, then Ui is right-special. 



This contradicts Lemma B.3.3(iii), since uq and Ui are distinct (because i > 1) 
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1960 elements of the trace on P' of a connected component of G. li i ~ n, then uq 

1961 and Un are right-special since UnVn € X and u„aw = qaw G X. We obtain 

1962 again a contradiction since uq and m„ are distinct. ■ 



6.4 Coset automaton 

Let be a Sturmian set and let X C be a bifix code. We introduce a 
new automaton denoted Bx or B for short, and called the coset automaton of 
X. Let R be the set of classes of 9x with the class of 1 still denoted 1. The 
coset automaton of X is the automaton Bx = {R, 1, 1) with set of states R and 
transitions induced by the transitions of the literal automaton A = {P, 1,1) of 
X*. Formally, for r, s G i? and a € A, one has r • a = s in the automaton B 
if there exist p in the class r and q in the class s such that p ■ a — q in the 
automaton A. 



Observe first that the definition is consistent since, by Lemma 6.3.6, if p • a 
and p' ■ a are nonempty and p,p' are in the same class r, then p ■ a and p' ■ a 
are in the same class. Since the class p • a is uniquely defined, the automaton is 
indeed deterministic. 

Observe next that if there is a path from p to p' in the automaton A labeled 
w , then there is a path from the class r of p to the class r' of p' labeled w in 
Bx. 

Figure 6.6: The automaton Bx- 



1979 Example 6.4.1 For the code X of Example 6.3.1 , the automaton Bx has three 

1980 states. State 2 is the red class, that is the class containing 6, and state 3 is the 

1981 blue class containing ha. The bifix code generating the submonoid recognized 

1982 by this automaton is ^ = a U b(ab*a)*b. Observe that the word bb is in Z* but 

1983 it is not in X*. 

1984 The following result shows that the coset automaton of X is the Stallings 

1985 automaton of the subgroup generated by X. 



1986 Lemma 6.4.2 Let F be a Sturmian set, and let X G F be a bifix code. The 

1987 coset automaton Bx is reversible and describes the subgroup generated by X. 

1988 Moreover X C Z , where Z is the bifix code generating the submonoid recognized 

1989 by Bx ■ 

1990 Proof. Let A = {P, 1, 1) be the literal automaton of X and Set Bx = {R, 1, 1). 

1991 Let r, s £ R and a d A he such that r ■ a = s ■ a is nonempty. Let p,q d P he 

1992 elements of the classes r and s respectively, such that p ■ a,q ■ a are nonempty. 
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Then pa, qa E P U X . To show that Bx is reversible, it is enough to show that 
p = q mod 6x ■ 

Suppose first that pa £ X. Then r ■ a ~ s ■ a ^ 1 and thus qa E X since 1 is 
isolated mod dx ■ Thus p = q mod 9x ■ 

Suppose next that pa, qa e P. Then there is a path (1 (g) uq, wi (8) 1, . . . , I'm (8 
1, 1 (8) Wn) in the incidence graph G of X, with pa — uq and qa = Un- We may 
assume that the nodes of the path are pairwise distinct, except for a possible 
equality uq = Un- 

If all the words Ui end with a, then p = q mod 9x- 

Otherwise, let i be minimal such that Ui ends with a letter distinct of a and 
J, with 1 < j < J < n be maximal such that Uj ends with a letter distinct of 
a. Then Vi and Wj+i are left-special and they are distinct since j -\- 1 > i. This 
contradicts Lemma 3.3.3| (iii) since Vi and Vj+i are distinct elements of the same 
trace on the set S" of proper nonempty suffixes of X. 

Thus the coset automaton is reversible. 

Let Z be the bifix code generating the submonoid recognized by Bx- To 
show the inclusion X d Z, consider a word x € X. There is a path from 1 to 1 
labeled x in A, hence also in Bx- Since the class of 1 modulo 9x is reduced to 
1, this path in Bx does not pass by 1 except at its ends. Thus x is in Z. 

Let us finally show that the coset automaton describes the group H = {X). 
By Proposition 3.1.3, the subgroup described by Bx is equal to (Z). Set K = 
(Z). Since X C Z, we have H C K. To show the converse inclusion, let us 
show by induction on the length of w G ^* that, for p,q G P, there is a path 
from the class of p to the class of q in Bx with label w then Hpw = Hq. By 
Proposition ^.3.5 , this holds for ui — 1. Next, assume that it is true for w 
and consider wa with a E A. Assume that there are states p,q,r g P such 
that there there is a path from the class of p to the class of q in Bx with 
label w, and an edge from the class of q to the class of r in Bx with the 
label a. By induction hypothesis, we have Hpw = Hq. Next, by definition of 
Bx, there is an s = g mod 9x such that s • a = r mod 9x- If sa G P, then 
s ■ a — sa, and by Proposition 3.3.5, we have Hs — Hq and Hsa = Hr. Thus 
Hpwa — Hqa = Hsa = Hr. Otherwise, sa € X and s ■ a — r = 1 because the 
class of 1 is a singleton. In this case, Hsa = H = Hr. This property shows that 
it z e Z, then Hz = H, that is z e H. Thus Z C and finally H = K. m 



2027 6.5 Return words 

2028 Let F be a factorial set. For u E F, define 

Tf{u) ={ze F\uze A+u n F} , T'p{u) = {z e F \ zu e uA+ n F} 

2029 and 

Rpiu) = Tf{u) \ TFiu)A+ , R'p[u) = T'p[u) \ A+T'p{u) . 

2030 When F = F{x) for an infinite word x, the sets Tf{u) and Rf{u) are respec- 

2031 tively the set of right return words to u and first right return words to u in x, 
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and r'^(u) and R'p{u) are respectively the set of left return words to u and first 
left return words to u in x. The relation between Rp{u) and R'p{u) is simply 



uRp{u) — R'p{u)u. 



(6.2) 



Words in the set uRp{u) — R'p{u)u are called complete return words in | |32[ . 
When there is no ambiguity, we will call the (first) right return words simply 
the (first) return words, omitting the 'right' specification. 

Example 6.5.1 Let F be the Fibonacci set. The sets Rp{u) and R'p{u) are 
given below for the first small words of F . 



u 


1 


a 


b 


aa 


ab 


ba 


aab 


aba 


baa 


bab 


Rp(u) 


a 


a 


ab 


baa 


ab 


ba 


aab 


ba 


baa 


aabab 


b 


ba 


aab 


babaa 


aab 


aba 


abaab 


aba 


babaa 


aabaabab 


R'piu) 


a 
b 


a 
ab 


ba 
baa 


aab 
aabab 


ab 
aba 


ba 
baa 


aab 
aabab 


ab 
aba 


baa 
baaba 


babaa 
babaabaa 



Vuillon has shown in [ p6[ that a; is a Sturmian word if and only if R'piu) 
has exactly two elements for every factor u oi x. Another proof of this result is 
given by Justin and Vuillon in [^2| . 

In fact, they show in [Ba the following theorem. Since this result is not 



2043 exactly formulated in |32 as stated here, we show how it follows easily from 

2044 their article. 

2045 Theorem 6.5.2 Let F be a Sturmian set. For any word u € F, the set Rp{u) 

2046 (and the set R'p{u) ) is a basis of the free group A° . 



By Equation ( |6.2[ ), the sets Rp{u) and R'p{u) are conjugate in the free group. 
Conjugacy by an element u is an automorphism of the free group. It follows that 
Rp{u) is a basis if and only ii R'p{u) is a basis. Thus, it suffices to prove the claim 
for R'p{u). We quote the following result of ||3^, Theorem 4.4, Corollaries 4.1 
and 4.5], with the notations of Section 2.3. 



Proposition 6.5.3 Let s be a standard strict episturmian word over A, let 
A — agai ■ ■ ■ be its directive word, and let (m„) be its sequence of palindrome 
prefixes. 

(i) The first left return words to Un are the words '0ao - a„_i (a) for a € A. 

(ii) For each factor u of s, there exist a word z and an integer n such that the 
first left return words to u are the words zyz~^, where y ranges over the 
first left return words to w„. 



Proof of Theorem [6.5.2 . We may assume that F — F{s) for some standard and 
strict episturmian word s. By Proposition ^.5.3 (i), the set of first left return 



words to Un is the image of the alphabet by the endomorphism "iAao - an-i- It is 
easily seen that these endomorphisms define automorphisms of the free group. 
We deduce that the set of first left return words to m„ is a basis of the free group 
on A. By Proposition 6.5.3(ii), the set of first left return words to w is a basis, 
too. This ends the proof. ■ 
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6.6 Proof of the main result 



Some preliminary results are needed for the proof of Theorem 6.2.1 



2068 Proposition 6.6.1 Let F be a Sturmian set and let X d F be a finite F- 

2069 maximal bifix code. Then {X) (1 F — X* H F. 

2070 Proof. We have X* O F C {X) n F. To show the converse inclusion, consider 

2071 the bifix code Z generating the submonoid recognized by the coset automaton 

2072 Bx associated to X. 

2073 Let us show that Z f] F ~ X. By Lemma ^.4.2 , we have X C Z and thus 

2074 X C Z F. Since X is an F-maximal bifix code, this implies that X = Z D F. 

2075 Since any reversible automaton is minimal and since the automaton Bx is 

2076 reversible by Lemma 5.4.2, it is equal to the minimal automaton of Z*. Let K 

2077 be the subgroup generated by Z. By Proposition |6.1.l| , we have K D A* = Z*. 

2078 This shows that 

{x)nFciKnF^KnA*nF^z*nF^x*nF. 

2079 The first inclusion holds because X C Z implies {X) C K. The last equality 

2080 follows from the fact that if zi • • • z„ e _F with zi, . . . , 2;„ £ Z , then each Zi is in 

2081 F hence in ZnF = X. Thus {X) n F C X* n F, which was to be proved. ■ 



We will use the following consequence of Proposition 6.6.1 



Corollary 6.6.2 Let F be a Sturmian set and let X d F be a finite F -maximal 
bifix code. Each right coset of the subgroup {X) generated by X contains at most 
one right-special proper prefix of X . 

Proof. Set H = {X). Let Q be the set of those proper prefixes of the words of 
X which are right-special. 

Let us show that ii p,q £ Q belong to the same right coset, then p — q. We 
may assume that p — uq. Since Hp = Hq, one has Huq = Hq. Consequently, 
Hu — H and thus u £ H. By Proposition 3.6.1, since u £ F, this implies that 



u £ X* and thus u = 1 since p is a proper prefix of X. 



Proof of Theorem 3.2.1 



Assume first that X is an F-maximal bifix code of F-degree d. Let P be 
the set of proper prefixes of X . Let Q be the set of words in P which are 
right-special. Let H be the subgroup generated by X . 

By Lemma 5.2.3 there is a right-special word u such that 5x {u) = d. The d 
suffixes of u which are in P are the elements of Q. By Theorem i.2.i , the word 
u is not an internal factor of X. 

Let 

V ^{v£A°\Qvcl HQ} . 

Any V £V defines a permutation of Q. Indeed, suppose that for p,q £ Q, one 
has pv, qv £ Hr for some r £ Q. Then rv~^ is in HpDHq. This forces Hp = Hq 



2102 and thus p ~ q hy Corollary 6.6.2 
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o o 

o ^ o o 

o o 

o ^ o 

Figure 6.7: A word y G Rf{u). 

The set V is a subgroup of A° . Indeed, 1 e V. Next, let v G V. Then 
for any q G Q, since v defines a permutation of Q, there is a p € Q such that 
pv £ Hq. Then qv^^ G Hp. This shows that v^^ e V. Next, if v,w € V, then 
Qvw C HQw C HQ and thus vw G V. 

We show that the set Rf{u) is contained in V . Indeed, let q G Q and 
y G Rf{u). Since q is a suffix of u, qy is a suffix of uy, and since uy is in F (by 
definition of Rf{u)), also qy is in F. The fact that X is i^-maximal implies that 
there is a word r £ P such that qy € X*r. We verify that the word r is a suffix 
of u. Since y G Rf{u), there is a word y' such that = y'u. Consequently, r is 
a suffix of y'u, and in fact the word r is a suffix of u. Indeed, one has \r\ < \u\ 
since o ther wise u is in I{X) and this is not the case. Thus we have r G Q (see 
Figure |67|) . Sin ce X* C H and r G Q, we have gy G HQ. Thus y G T^. 

By Theorem ^.5.2| , the group generated by Rf{u) is A° . Since Rf{u) C 
and since V is a subgroup of A°, we have V = A°. Thus C HQ for any 
It; G A°. Since 1 G Q, we have in particular w G HQ. Thus ^° = HQ. Since 
Card((5) = d, and since the right cosets Hq for g G Q are pairwise disjoint. 



this shows that H is a subgroup of index d. By Theorem 5.2.1 and in view of 
Schreier's Formula, X is a basis of H. 

Assume conversely that the bifix code X C F is a basis of the group H = (X) 
and that {X) has index d. Since X is a basis, by Schreier's Formula, we have 
Card(Ar) = (fc— l)d + 1, where k = Card(A). The case fc = 1 is straightforward; 



thus we assume k > 2. By Theorem 4.4.3|, there is a finite _F- maximal bifix code 



Y containing X . Let e be the F-degree of Y. By the first part of the proof, Y 
is a basis of a subgroup K of index e of j4°. In particular, it has {k — l)e + 1 
elements. Since X CY, we have (fc— l)c?+l < (fc — l)e + l and thus d < e. On 
the other hand, since H is included in if, d is a multiple of e and thus e < d. 
We conclude that d = e and thus that X = Y. m 

Example 6.6.3 Let F be the Fibonacci set. Let X C F he the bifix code shown 



on Figure 6.8. The right-special proper prefixes of the words of X are the four 
suffixes of aba and are indicated in black on the figure. The states of the coset 
automaton are the sets {1}, {a, bab, abaab}, {aba, b, baa} and {ba, ab, abaa}. The 
code X has F-degree 4. Each state is represented by its right-special factor in 
Figure 



We end this section with a combinatorial consequence of Theorem |6.2.1 



Proposition 6.6.4 Let F be a Sturmian set on an alphabet with fc letters and 
let X d F be a finite F -maximal bifix code of F -degree d. Let P (resp. S) be 
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Figure 6.8: An i^-maximal bifix code of i^-degree 4. 




Figure 6.9: The associated coset automaton. 



2139 the set of proper prefixes (resp. suffixes) of X . Then 

J2 1^1 = Card(P) + Card(S') + (fc - 2)d. 

2140 We will use the following proposition, of independent interest. 

2141 Proposition 6.6.5 Let F be a Sturmian set and let X d F he a finite F- 

2142 maximal bifix code of F -degree d. The coset automaton Bx is a group automaton 

2143 with d states. Each state of Bx other than 1 is an F -maximal suffix code. 

2144 Proof. Let A = (P, 1, 1) be the literal automaton re cogni zing X* and let B = 

2145 {R, 1, 1) be the coset automaton of X. By Lemma 3.4.2 , the automaton B is 

2146 reversible and d escrib es the subgroup H generated by X. 

2147 By Theorem 6.2.1 , the subgroup H has index d in A° . Since B is reversible, 

2148 it is minimal. Propositions 6.1.6 and 6.1.4 show that S is a group automaton. 

2149 Its number of states is d since a group automaton which describes a subgroup 

2150 of index d has d states by Proposition ^.1.5 . 

2151 Finally, consider r ^ R and let Xr = {p G P \ 1 ■ p — r}. Let us show that 

2152 any w G F is suffix-comparable with an element of Xr. We may assume that 

2153 w is longer than any word of X. Since Bx is a group automaton, there is an 

2154 u Cz R such that u - w = r. Since w is longer than any word of X, the path from 

2155 u to r labeled w passes through state 1. Thus w has a parse {s,x,p) such that 

2156 1 ■ p — r and thus w has a suffix in Xr . This shows that Xr is an P-maximal 

2157 suffix code. ■ 
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2158 Note that the fact that the set P of nonempty proper prefixes of X is a disjoint 

2159 union of d — 1 i^-maximal suffix codes is also a consequence of the dual statement 



of Theorem 4.3.7 



Proof of Proposition 3.6.4. Let H be the subgroup generated by X. By Theo- 
rem 3.2.1, the set X is a basis of H and the index of H is equal to d = dp^X). 
Let G be the incidence graph of X . Let E be the set of edges of G. One has 

Card(S) = ^ {\x\ - I) =^\x\- Card(X) = ^\x\ ~ {k - l)d ~ I . 



x^X 



xeX 



By Proposition 3.6.5| , the classes of 9x are the set {1} and d — 1 F-maximal 
suffix codes denoted P^, for i = 1, . . . , d — 1. Each of the latter is the trace on 
P \ 1 of a connected component Gi of G. Let Gi be the subgraph of G induced 



by its connected component Gi. By Lemma 3.3.3, Gi is a tree 



Similarly, let Si be the trace on 5 \ 1 of the connected component Gi . Let 
Ei be the set of edges of Gi. Since Gi is a tree, we have Card(i?i) — Card(Pi) + 
Card(S',) - 1 for i = 1, . . . , d - 1. Finally 



rf-i 



C&r<l{E) = Card(Si) = ^(Card(Pi) + Card(S',) - 1) 



1=1 



i=l 



Card(P \ 1) + Card(S' \ 1) - (d - 1) , 



whence the result. 



7 Syntactic groups 



Let P be a recurrent subset of A* . In this section, we introduce the notion of 
P-group of a bifix code X C P of finite P-degree. It is a permutation group of 
degree dp^X). We investigate the relation between this group and the notion of 
group of a maximal bifix code (Theorem 7.2. 5| ). We use Theorem 3.2.1 to prove 
a new result on the syntactic groups of bifix codes: any transitive permutation 
group G of degree d and with k generators is a syntactic group of a bifix code 
with {k — l)d-|- 1 elements (Theorem 7.2.3). 



2177 7.1 Preliminaries 

2178 We first recall the basic terminology on groups in monoids (see Q for a more 

2179 detailed exposition). We are mainly concerned with monoids of maps from a 

2180 set into itself. The maps considered in this section are partial maps. 

2181 Let M be a monoid. A group in M is a subsemigroup of M which is iso- 

2182 morphic to a group. Note that the neutral element of a group contained in M 

2183 needs not be equal to the neutral element of M . 

2184 A group in M is maximal if it not included in another group in M . 
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2185 Proposition 7.1.1 Let G be a group in a monoid M of partial maps from a 

2186 set Q into itself. All elements of G have the same image I . The restriction of 

2187 the elements of G to I is a faithful representation of G as a permutation group 

2188 on I. 

2189 Proof. Two elements g,h G have the same image. Indeed, let k be the inverse 

2190 of g in G. Then h = hkg and thus the image of h is contained in the image of g. 

2191 The converse inclusion is shown analogously. Then G is a permutation group 

2192 on the common image / of its elements. Indeed, let e be the neutral element of 

2193 G. Then for any p G /, let g € Q be such that qe = p. Then pe = qe^ = qe = p. 

2194 This shows that e is the identity on /. Next, for any g G G the inverse fc of i? is 

2195 such that gk — kg = e. Thus 5 is a permutation on /. 

2196 Let g,g' G G be such that they have the same restriction to /. Then for each 

2197 p E Q, p{eg) = {pe)g = (pe)g' = p{eg') since pe G /. Since eg = g and eg' = 

2198 we obtain g ~ g' . This shows that the representation of G by permutations on 

2199 / is faithful. ■ 



2200 
2201 
2202 

2203 
2204 
2205 
2206 
2207 
2208 
2209 
2210 
2211 
2212 
2213 
2214 



Let G be a group in a monoid of maps from Q into itself as above. The 
canonical representation of G by permutations is the restriction of the maps in 
G to their common image. 

A syntactic group of a prefix code X is the canonical representation by 
permutations of a maximal group in the monoid of transitions of the minimal 
automaton A{X*) of X*. 

Let X be a prefix code and let A — A{X*). A syntactic group G of X is 
called special if ip^(G) is a cyclic submonoid of A*. In particular a special 
syntactic group is cyclic. 

The degree of a permutation group G on a set R is the cardinality of R. 
Recall that the group G is transitive if for any r, s € R there is some g € G such 
that rg = s. 

A permutation group G on a set R and a permutation group H ona, set S are 
equivalent if there exists a bijection (3 : R ^ S and an isomorphism a : G ^ H 
such that, for aW g £ G and r £ R, one has 



/3(rg) = (3{r)aig) , 

2215 in other terms, if the diagram of Figure [7.l| is commutative for all g E G. 




Figure 7.1: Equivalent permutation groups. 
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2216 Let us recall the notation concerning Green relations in a monoid M (see |^). 

2217 Wc denote by TZ the equivalence in M defined by rriR.n if m, n generate the same 

2218 right ideal, i.e. if mM = nM. We denote by R{m) the 7?.-class of m. 

2219 Symmetrically, we denote by L the equivalence defined by mLn if m, n gen- 

2220 erate the same left ideal, i.e. if Mm = Mn. We denote by Lira) the £-class of 

2221 m. 

2222 It is well known that the equivalences C and TZ commute. We denote by TD 

2223 the equivalence CTZ = TZC. Finally, we denote by % the equivalence £r\TZ. 

2224 A I?-class D is regular if it contains an idempotent. In this case, there is 

2225 at least an idempotent in each £-class of D and in each 7?.-class of D. The 

2226 following statement is known as Clifford and Miller's Lemma. For m,n G M, 

2227 one has mn G R{m) n L{n) if and only if R{n) n L{m) contains an idempotent. 

2228 Assume that M is a monoid of maps from a finite set Q into itself. 

2229 lim,n G M are ^-equivalent, then they have the same image. If they are 7?,- 

2230 equivalent, then they have the same nuclear equivalence (the nuclear equivalence 

2231 of a partial map m from Q into itself is the partial equivalence, for which p,q £ Q 

2232 are equivalent if m is defined on p and q and pm = qm). 

2233 If TO,n e AI are H-equivalent, they have the same image and the same 

2234 nuclear equivalence. The converse is not true but it holds in the following 

2235 important particular case. 

2236 Proposition 7.1.2 Let M he a monoid of maps from a finite set Q into itself. 

2237 Let e G M be an idempotent. An element m of M is in the H-class of e if and 

2238 only if it has the same nuclear equivalence and the same image as e. 

2239 Proof. If m and e are "H-equivalent, they have the same nuclear equivalence p 

2240 and the same image /. Conversely, we have me = m since e is the identity on 

2241 its image /. For any p G Q, pe^ — pe implies that p and pe are in the same class 

2242 of p. This implies that pem — pm. Thus em = m. 

2243 Finally, the restriction of m to / is a permutation. Indeed, pm = qm for 

2244 p,q €z I implies pe = qe which forces p — q. Let fc > be such that the 

2245 restriction of m'^ to / is the identity. Then m'^ and e are two idempotents with 

2246 the same nuclear equivalence and the same image. This implies that they are 

2247 equal. Thus m and e are in the same H-class. ■ 

2248 Let F be a recurrent set and let X C F be a bifix code of finite F-degree d. 

2249 Let A = ((5,1,1) be a simple automaton recognizing X* . We set if = ipj, and 

2250 we denote by M the transition monoid ip{A*) of A. 

2251 For a word w, we denote by Im('u;) the image of w with respect to A, that is 

2252 the set Im(w) — {p - w \ p G Q}. The rank of w (with respect to the automaton 

2253 A) is the number rank(u') = Card(Im(?«)). Then Im('u;) is also the image of the 

2254 map (p{w) (recall that the action of M is on the right of the elements of Q), 

2255 and the rank of w is also the rank of ip{w). Clearly Tank{uwv) < Ta,nk{w) for 

2256 all M, w, V. 

2257 Proposition 7.1.3 The set of elements of (p{F) of rank d is included in a 

2258 regular T)- class of M . 
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2259 We use the following lemmas. 



Lemma 7.1.4 A word w £ F which has d parses with respect to X has rank d 
with respect to A. Moreover, lm{w) is the set of states 1 ■ p for all p such that 
there is a parse (s, x,p) of w. For all q G lm(w), there is a unique proper prefix 
p of P which is a suffix of w, and such that q = 1 ■ p. 

Proof. Consider first two states q,r € Q and suppose that q ■ w = r. Since A is 
simple, it is trim. Consequently there exist two words u, v such that 1 • u = q 



and r-v — \. It follows that uwv a X* . Since w has d parses, by Theorem 4.2.8 
it is not an internal factor of a word in X. Thus there is a parse {s,x,p) of w 
such that us,pv e X*. Then r = 1 ■ p. The relation r — ^ {s,x,p) is a function. 
Indeed, let us show that if (s, a;,p) and {s',x',p') are two distinct parses of w, 
then 1 ■ p ^ 1 ■ p'- Assume the contrary. Then we have pv,p'v G X* for the 
same word v. Since p,p' are suffixes of w, they are suffix-comparable and thus 
p — p' since X is bifix. This is impossible if the parses are distinct. Of course, 
the function r i— >■ {s,x,p) is injective since A is deterministic. 

Conversely, let (s, x, p) be a parse of w. Since X is an i^-maximal bifix code, 
there exist by Theorem 4.2.2| words u,v such that us,pv € X*. Thus we have 



1 ■ us ~ 1 ■ X ^ 1 ■ pv = I. Consequently {1 ■ u) ■ w = 1 ■ usxp = I ■ xp = 1 ■ p. 
This shows that 1 • p G Im(ti;). 



2279 Lemma 7.1.5 Let u d F be a word, //rank(u) = d, then rank(ut;) = d for all 

2280 V such that uv Cz F. 

2281 Proof. Since X is J'-thin, there exists w G F which is not a factor of a word in 

2282 X. This word w has d parses. Assume uv G F. Since F is recurrent, there exists 



a word t such that uvtw € F. Then uvtw also has d parses. By Lemma 7.1.4, 
this implies that the rank of uvtw is d. Since d — rank(Mwtw) < rank(ut;) < 
rank(u) — d, one has rank(itw) — d. ■ 



Proof of Proposition 7.1.3. Let u,v (z F he two words of rank d. Set m = f{u) 
and n — ip{v). Let w be such that uwv G F. We show first that mTZip{uwv) 
and nCip{uwv). 



For this, let t be such that uwvtu G F . Set z = wvtu. By Lemma 7.1.5, the 
rank of uz is d. Since Im(Mz) C Im(z) C Im(u), this implies that the images are 
equal. Consequently, the restriction of (^(z) to Im(u) is a permutation. Since 
Im(M) is finite, there is an integer i>l such that 'p{zY is the identity on Im(w). 
Set e — f{zY and s = tuz^~^. Then, since e is the identity on Im(M), one has 
m = me. Thus m — ip{uwv)(p{s), and since ip(uwv) = mip{wv), it follows that 
m and ip{uwv) are 7?.-equivalent. 

Similarly n and ip{uwv) are /^-equivalent. Indeed, set z' — tuwv. Then 
Im(i;2') C Im(z') C Im(i;). Since vz' is a factor of and z has rank rf, it follows 
that d = rank(z^) < rank(tiz') < rank(u) = d. Therefore, vz' has rank d and 
consequently the images Im(tiz'), Im(z') and Im(w) are equal. There is an integer 
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2300 £' > 1 such that ip{z'Y is the identity on lm{v). Set e' ~ fiz'Y ■ Then n = 

2301 ne' — nip{z'Y ~^(p{tuwv) = nqip{uwv) , with q = fiz'Y ^^fit)- Since ip{uwv) = 

2302 Lp{uw)n, one has nCip{uwv). Thus m, n are 2?-equivalent, and (p{uwv) € i?(m)n 

2303 L{n). 

2304 Set p = ip{wv). Then p = ip{w)n and, with the previous notation, n = ne' = 

2305 nqip{u)p, so L{n) = L{p). Thus mp = (p{uwv) € i?(m) n L{p), and by CUfford 

2306 and Miller's Lemma, R{p) H L(m) contains an idempotent. Thus the I?-class of 

2307 m, p and n is regular. ■ 



2308 7.2 Group of a bifix code 

2309 Let M be a monoid. The H-class of an idempotent e is denoted H{e). It is the 

2310 maximal group contained in M and containing e. 

2311 All groups H{e) for e idempotent in a regular I?-class D are isomorphic. The 

2312 structure group (or Schiitzenberger group) of D is any one of them. When M is 

2313 a monoid of maps from a set Q into itself, the canonical representations of the 

2314 groups H{e) are equivalent permutation groups. See §, Proposition 9.1.9]. We 

2315 then also consider the structure group as a permutation group. 

2316 Let be a recurrent set and let X C F be a bifix code of finite i^-degree d. 

2317 Let A = ((5,1,1) be a simple automaton recognizing X* . Set (p = (pA- The 

2318 structure group of the P-class of elements of rank d of <f{F) is a permutation 

2319 group of degree d. By Proposition 9.5.1 in this group does not depend on 

2320 the choice of the simple automaton A recognizing X*. It is called the F- group 

2321 of the code X and denoted Gf{X). 

2322 When F ~ A* , the group Gf{X) is the group G{X) of the code X defined 

2323 in 1^. Indeed, in this case, the P-class of elements of rank d coincides with the 

2324 minimal ideal of the monoid ip{A*). 

2325 The following example shows that the .F-group of an F-maximal bifix code 

2326 is not always transitive. 



2327 Example 7.2.1 Let X = {ab,ba} and let F = F{{ab)*). Then X is an F- 

2328 maximal bifix code of F-degree 2. It can be verified easily that the syntactic 

2329 monoid of X* contains only trivial subgroups (see also Exercises 7.1.1, 7.2.1 

2330 in Q). Thus Gf{X) is reduced to the identity. 



2331 Example 7.2.2 We consider again the code of Example 3.6.3| . The minimal 

2332 automaton oi X* is represented on Figure [t!^ . 

2333 We have represented on Figure the P-class of elements of rank 4 meeting 

2334 ^{F). It is composed of three £-classes and three 7?,-classes. Each £-class is 

2335 represented by a column and each 7?.-class by a row. On top of each column, we 

2336 have indicated the common image of the its elements. On the left of each row, 

2337 we have indicated the common nuclear equivalence of its elements (recall that 

2338 two elements are equivalent for the nuclear equivalence if and only if they have 

2339 the same image). The 'H-classes containing an idempotent are indicated by a 

2340 star. Each "H-class has four elements, and five of them are groups (this happens 
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Figure 7.2: An i^-maximal bifix code of i^-degree 4. 





1,2,4,8 


1,2,5,9 


1,3,6,7 


1|2,6|3|7 






* 


1|2|4,9|5,8 


ba 


* 


b 


1|2,6|3,8|4,7 


aba 




ab 



Figure 7.3: The I?-class of rank 4. 



2341 when the image is a system of representatives of the nuclear equivalence). For 

2342 instance, the five classes in the nuclear equivalence of (p{ba) are {1}, {2}, {4, 9}, 

2343 {5} and {8}, and the "H-class of (the image of) ba is composed of the following 

2344 elements: 

Word Permutation 

(18)(24) 
baaba (12) (48) 
baba (1) 
babaaba (14) (28) 



2345 
2346 



The structure group of this I?-class is the Abelian group Z/2Z x Z/2Z. It is the 
F-group of the code. 

The aim of this section is to prove the following theorem. 



2348 Theorem 7.2.3 Any transitive permutation group of degree d which can be 

2349 generated by k elements is a syntactic group of a bifix code with {k — l)d + 1 

2350 elements. 



2351 
2352 
2353 
2354 
2355 



Theorem |7.2.3| was known before in particular cases. In it is shown that 
any transitive permutation group is a syntactic group of a finite bifix code. The 
bound 0? + 1 on the cardinality of the bifix code is proved for the case of a group 
generated by a d-cycle and another permutation. In it is proved that for 
an Abelian group of rank 2 and order d there exists a bifix code X such that 
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Card(X) ~ 1 — d . The proof is based on the fact that the Cayley graph of an 
Abehan group contains a Hamiltonian cycle. 

Let us call minimal rank of a group G the minimal cardinality of a generating 
set for G. Theorem 7.2.3 is related to the following conjecture [H. 



Let X be a Gnite biSx code and let G be a transitive permutation group of 
degree d and minimal rank k. If G is a syntactic group of X, then Card(X) > 
(fc- l)d+l. 



Theorem 7.2.2 shows that the lower bound is sharp. 

The following result, which is from shows that the conjecture holds for 
fc = 2. 



2366 Theorem 7.2.4 Let G be a permutation group of degree d. If G is a nonspecial 

2367 syntactic group of a finite prefix code X , then Card(X) > d + 1. 



Theorem 7.2.4 is clearly not true for special syntactic groups since Z/nZ is 
a syntactic group oi X ~ a" for any ti > 1 . 

Theorem 7.2.3| is a consequence of the following theorem which can be 
viewed as a complement to Theorem 4.2.11. The proof itself makes use of 
Theorem |6.2.l| 



2373 Theorem 7.2.5 Let Z <Z A* he a group code of degree d. Let F he a Sturmian 

2374 set. The set X = Z H F is an F -maximal hifix code of F -degree d and Gp{X) = 

2375 G'(^). 

2376 Proof. The fact that X is an _F-maximal bifix code of F-degree d results from 

2377 Corollary 3.2.S. 

2378 Let us show that Gf{X) — G{Z). Let B — (i?, 1, 1) be the minimal au- 

2379 tomaton of Z* . Set ^ = tpB and G — tp{A*). Thus G is a permutation group 

2380 equivalent to G{Z). 

2381 Let A = (Qj li 1) be the minimal automaton of X* . Set ip = t^^. Denote 

2382 by Im(w) the image of ^p{w) with respect to A. Thus Im(w) = {t^Q\s-w = 

2383 t for some s e Q}. 

2384 Let u £ F be a word with d parses with respect to X. Let / = Im(M). By 

2385 Lemma 7.1.4 , the word u has rank d and thus Card(/) = d. 

2386 Let Y — Rf{u) be the set of first return words to u. By Theorem |6.5.2 , the 

2387 set y is a basis of the free group A°. For any y £ Y, the restriction of (p{y) to I 

2388 is a permutation of /. Indeed, uy € A'^u implies lm{uy) C /. Since uy € F, the 

2389 set lm{uy) has d elements by Lemma [7.1.5[ . Thus lm{uy) ~ I. Since Im(u) = /, 

2390 this proves the claim. 

2391 Let e be an idempotent in ip{Y^). The restriction of e to / is the identity. 

2392 Any long enough element of (p^^{e) n Y* has u as a suffix. Thus the image of e 

2393 is /. Moreover, since ip(u)e = >f{u) and e g (p(A*u), e and ip{u) belong to the 

2394 same £-class and thus to the I?-class. Thus e belongs to the P-class of (p{A*) 

2395 which contains the elements of rank d in (p{F). 

2396 Let G' be the maximal group contained in ip{A*) which contains e. It is a 

2397 permutation group on / which is equivalent to Gf{X). 
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For y £Y* , let xiv) be the restriction of if{y) to the set /. 

For any y G Y*, e(p{y)e has the same nuclear equivalence and the same 



2400 image as e. By Proposition 7.1.2 it implies that they are in the same 'H-class. 

2401 Thus eifi{y)e is in G' . 

2402 Since e(p{y)e and (p{y) have the same restriction to / and since eip{y)e belongs 

2403 to the "H-class of e, x is a morphism from Y* into the permutation group G". 

2404 Since Y generates A° , this morphism is surjective. Indeed, if ifiiw) G G", let 

2405 ?/i,...,y„ S Y be such that w = • • • y^" with Si — ±1. Then x(w) = 
™6 xiViY^ ■■■xiVnY"-- Since G' is a finite group x(y)"^ G x(^*)- Thus x(w) € 

2407 X(i"*)- 

2408 Let us show that G and G" are equivalent as permutation groups. 

2409 For this, let us define a bijection fi : I ^ R as follows. Let P be the set 

2410 of proper prefixes of the words of X and let 5* be the set of elements of P 

2411 which are suffixes of u. For i € I, there is a unique q € S such that i = 1 • g by 

2412 Lemma [7. 1.4 Set /3(i) = l^/'(<z). We show that /3 is injective. Let q,t e She such 

2413 that lijj{q) = lip{t). Assume that |g| < \t\. Since q,t are suffix-comparable, we 

2414 have t = vq. Since lip{t) = l'ip{v)'il}{q) = lipi^q) and since ipil) is a permutation, 

2415 we have lipiv) = 1 and thus v £ Z*. Since u is in F and since Z* (1 F C X*, 

2416 this implies v G X* and thus v = 1. This shows that q — t and thus that /3 is 

2417 injective. Since Card(i?) — Card(/) = d, we have shown that /3 is a bijection. 

2418 Let us verify that for any i,j G / and y G K*, we have 

^v{y)=3 ^ mi>{y)^m- (7.1) 

2419 Let us first prove ( [ri| ) for y ^Y . For this, let (7, t G 5* be such that i ^ 1- q, 

2420 j — I ■ t. The states q, t exist by Lemma [7.1. 4 Then 

i^{y) = j ^fiqy) = qy e 

2421 The last equivalence holds because 1 ■ qy = 1 ■ v ioT the word t; G P such that 

2422 qy G X*v. But since G A*u, f; is a suffix of u and thus v G S. This forces 

2423 t — V. 

2424 Since qy £ we have 

gy G X*i qye Z*t 

2425 and thus, we obtain 

w{y)=i ■^=^ qy ^ z*t -^^^ (3{i)ip{y) ^ i3{j). 



This proves (|7J|) for y £ Y. Next, let us show that if y, z G F* satisfy ( |7.1[ ) 
for aU i, j G /, the same is true for yz. Assume first that for i,j G /, one has 
iifiiyz) = j. Since the restrictions of (p{y),(f{z) to / are permutations, there is 
a unique k G I such that iip{y) = k and kip{z) — j. Then, since y,z satisfy 
0, we have /3(z)V(2/) = /3(fc) and /3(/c)V(^) = Thus = /3(j). 

Conversely, assume that P{i)'ip{yz) = Since /3 is a bijection from I onto i?, 

there is a unique fc G / such that f3{k) — I3{i)%p{y). Then f3{k)i/j{z) = I3{j). By 
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(7.1), we have i(p{y) ~ k and kip{z) = j whence itp{yz) = j. This proves that 
yz satisfies ( [7.l[). 

Equation ( |7.1| ) shows that we may define a morphism a from G' to G by 
"^(5) — ''Piv) y e Y* such that xiv) = 9- This map is injective. Indeed, 
if a{g) = a{g'), let y,y' e Y* be such that x(y) = 9 and xiv') = 9'- Then, 
"(5) = i^{y) and a{g') = %k{y') imply that il}{y) = ip{y'). By (|73|), = 
implies that xiv) = and thus g — g' . Since Y generates the free group A° , 

the map is surjective. Indeed, for any a ^ A we have a = y\^ ■ ■ ■ y^" with yi €Y 
and ±1. Thus V(a) = ' ' ' =^1' ' ' 'ffn") with xiVi) = 5j- 

Finally, the commutative diagrams of Figure 7.4 show that the pair (a, /3) is 
an equivalence of permutation groups. 



y 



1 • qy 



1-0(9) — ^ 10(92/) 



Figure 7.4: The equivalence of G and G". 



2445 Example 7.2.6 We illustrate the proof of Theorem 7.2.5. Let Z be the group 

2446 code of degree 4 recognized by the automaton of Figure [7.5|. It is the automaton 
of Figure with more convenient labels for the states. It is clear the G{Z) is 



2447 
2448 





( ^ XL 




JT 2 ) 




a 




bi ]b 


a 


hi ]l 


( sY' 




~Y 4 ) 


^^"^ a 



Figure 7.5: A group automaton. 
Z/2Z X Z /2Z. Let F be the Fibonacci set. The code X = Z n F is the code of 



Example 6.6.3 . The minimal automaton of X* is represented on Figure 7^2 
Let us chose u — aba. It has rank 4, and Im(it) = {1,2,4,8}. One gets 
Y = {ba,aba}. Next x{ba) = (18)(24) and x{aba) = (14)(28). The function l3 
maps 1,2, 4, 8 to 1,2, 3, 4 respectively. 



2453 The following example shows that Theorem 7.2.5 does not hold for the set 

2454 of factors of an episturmian word which is not strict. 
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Example 7.2.7 Let F and X be as in Example 5.2.7. The bifix code X has 



F-degree 8. Let A be the minimal automaton of X* represented on Figm'e 5.2 
The image of ipibc) is the set / = {1,4,6,13,14,21,25,32}. The submonoid 
U = {u £ A* I / • u = /} is generated by acbc and acacbc. The restrictions to / 
of ip{acbc) and ip{acacbc) are 

(1 14)(25 6)(4 21)(32 13), (1 6)(14 25)(4 13)(21 32). 

These permutations generate a group which has two orbits: {1,6, 14,25} and 
{4, 13, 21, 32}. The restriction to each orbit is isomorphic to (Z/2Z)^. Thus the 
i^-group of X is (Z/2Z)^. However X = Z D F where Z is a group code such 
that G{Z) = (Z/2Z)3. 



Proof of Theorem 7.2.3. Let G be a transitive permutation group of degree d 
and let Z he a group code on an alphabet A with k letters such that G{Z) = G. 
Let F be a Sturmian set on the alphabet A and \ci X = Z C^ F. Then, by 
Theorem |7!2^ Gf{X)^G and, by Theorem IsTTI, X has (fc-l)d+l elements. 
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